I have a dataframe in R. Either the df has two columns with values, or it has the dimensions 0,0.
If the dataframe has columns with values, I want to keep these values, but if the dimensions are 0,0, I want to create two columns with one row containing 0,0 values.
Either it looks like this:
start = c (2, 10, 20)
end = c(5, 15, 25)
dataframe = as.data.frame (cbind (start, end))
-> if the df looks like this, it should be retained
Or like this:
start = c ()
end = c()
dataframe = as.data.frame (cbind (start, end))
-> if the df looks like this, a row with (0,0) should be added in the first row.
I tried ifelse
dataframe_new = ifelse (nrow(dataframe) == 0, cbind (start = 0,end =0) , dataframe)
But if the dataframe is not empty, it remains only the value of the first row and column. If the dataframe is empty, there is only one 0.
Instead of the function ifelse, you should be using if...else clauses here. ifelse is used when you want to create a vector whose elements vary conditionally according to a logical vector with the same length as the output vector. That's not what you have here, but rather a simple branching condition (if the data frame has zero rows do one thing, if not then do something else). This is when you use if(condition) {do something} else {do another thing}
If you need to use the code a lot, you could save yourself time by putting it in a simple function:
fill_empty <- function(x) {
if(nrow(x) == 0) data.frame(start = 0, end = 0) else x
}
Let's test this on your examples:
start = c (2, 10, 20)
end = c(5, 15, 25)
dataframe = as.data.frame (cbind (start, end))
fill_empty(dataframe)
#> start end
#> 1 2 5
#> 2 10 15
#> 3 20 25
And
start = c ()
end = c()
dataframe = as.data.frame (cbind (start, end))
fill_empty(dataframe)
#> start end
#> 1 0 0
Created on 2022-09-19 with reprex v2.0.2
Related
I'm currently having an issue where I'm trying to nest simulated data for an efficient frontier inside a tibble containing all 250 simulations. The tibble will have 1 column named "sim" which indicates the number of the simulation, i.e. the rows in this column runs from 1:250. The other column should contain the nested simulation data which is a 3x123 tibble for each simulation. (Really hope this makes sense).
I've tried to replicate the problem such that you don't need all of the previous code and data to see the issue. Problem is that the nested data is saved as a list:
library(tidyverse)
counter = 0
table <- tibble(sim = 1:250, obs = NA)
for(i in (1:250)){
counter = counter + 1
tibble <- tibble(a = NA, b = 1:113, c = 2, d = 3)
tibble$a <- counter
nested_tibble <- tibble %>% nest(data = -a) %>% select(-a)
table$obs[i] <- nested_tibble
}
In this simplified reproducible example the values in the tibble are identical. Whereas in the assignment I'm working on, the tibble contains values for the efficient frontier. Variable 'a' in the tibble corresponds to simulation number and this is the variable i use to nest the efficient frontier. Afterwards I wish to remove this variable a, and insert the nested tible in the corresponding 'obs' field currently being NA.
I really hope this makes sense. I'm still very new with R and coding. If you need any additional documentation please let me know.
Your nested_tibble is a list containing a tibble. To access the tibble inside the list, you can use double bracket notation: nested_tibble[[1]]. So to get the result you want you can change your loop as follows:
counter = 0
table <- tibble(sim = 1:250, obs = NA)
for(i in (1:250)){
counter = counter + 1
tibble <- tibble(a = NA, b = 1:113, c = 2, d = 3)
tibble$a <- counter
nested_tibble <- tibble %>% nest(data = -a) %>% select(-a)
table$obs[i] <- nested_tibble[[1]]
}
So I have these values:
This dataframe is 56,000 rows x 1 column
and
this matrix is has 56,000 rows and 2 columns
essentially what i want to do is compare how many times the value in row of the data frame is greater than the value in the row in the matrix.
EX: 8.34 > 2.05, so i is incremented by 1, then 8.34 > -9.15, i is incremented by 1 again. 4.902 > .87, i incremented by 1 again.
So this is my code:
#Question 3 count times observed is different than null
compareObservedNull = function(x, set1, set2){
i = 0
if(x[set1] > x[set2]){
i = i + 1
}
}
observedGreaterNum = apply(MARGIN = 1,
FUN = compareObservedNull,
tOBSERVEDDF,
tNullDistributionMatrix)
When running my code I get
Error in x[set1] : only 0's may be mixed with negative subscripts
Is there a function implemented in R to compare values at the row level?
Basically all you is to get the number of times a value df[i] is greater then the mat[i,j] a solution would be to convert the data.frame (after adding a second column equal to the first) into a matrix and compare it with the matrix and sum up the resulting logical vector. One thing you gotta keep in mind in R is that most baseR function are vectorised especially the basic ones [, +, *, -, >, == ....
df$V2 = df$V1
sum(as.matrix(df) > mat)
#> [1] 175
Data
set.seed(1)
df <- data.frame(V1=rnorm(100, 4,4))
mat <- matrix(rnorm(200), nrow=100)
If your dataframe is called dat with column name col_name and matrix mat you could do :
dat$result <- rowSums(dat$col_name > mat, na.rm = TRUE)
result will have count of number of values in col_name which is greater than respective row value in mat.
If you want to count total values you can sum the new column.
sum(dat$result)
Here is the dataframe
sampledf = data.frame(timeinterval = c(1:120), hour = c(rep(NA, times = 85), 1, rep(NA, times = 5), 1, rep(NA, times = 4),1, rep(NA, times = 4), 1, rep(NA, times = 18)))
I want to replace the NAs in column hour such that values between 86th row and 92 (inclusive) and then between 97 and 102 (inclusive) should all be 1.
Here is what I've tried so far:
1. Getting the list of rownames with value 1 in hour column
2. Looping through (This is what is not working!)
ones = which(sampledf$hour == 1)
n = (length(ones)+1)/2
chunk <- function(ones,n) split(ones, cut(seq_along(ones), n, labels = FALSE))
y = chunk(ones,n)
for (i in y) {
sampledf$Hour[c(y$i[1]:y$i[2])] == 1
}
Help me out, I'm new to R.
In python we have ffill method for this, what an equivalent here?
Thanks!
sampledf$hour[between(sampledf$timeinterval,86,92) | between(sampledf$timeinterval,97,102)]<-1
Basically you subset sampledf's hour column by those cases where timeinterval is between 86-92 or (|) 92-102, and assign 1 to all those cases.
If you want to assign 1 to all timeintervals in the given ranges:
sampledf$hour[sampledf$timeinterval %in% c(86:92,97:102)] <- 1
If you want to assign 1 to cases based on the rownumbers of your data:
sampledf$hour[c(86:92,97:102)] <- 1
If you want to add a cumulated sum to your values as in your comment, you can just use the cumsum() function and do:
sampledf$hour[which(sampledf$hour == 1)] <- cumsum(sampledf$hour[which(sampledf$hour == 1)])
I do need some help. I am trying to build a function or a loop using R that could go through a binary variable (1 and 0) in a dataframe in such way that everytime 1 is followed by a 0, I could save a vector indicating the value of a third variable (y) in the same line where it occurred. I tried a couple of options based on previous posts, but nothing gives me something even close from that.
My data looks a bit like that:
ID <- rep(1001, 5)
variable <- c(1, 1, 0, 1, 0)
y <- c(10, 20, 30, 40, 50)
df <- cbind(ID, variable, y)
In this case, for example, the answer would give me a vector with the y values 30 and 50. Sorry if someone already has answered that, I could not find something similar. Thanks a lot!
Here's a 'vectorial' solution. Basically, I paste together variable in position i and i+1. Then I check to see if the combination is "10". The position you want is actually the next one (e.g. i+1), so we add 1.
df <- data.frame(ID, variable, y)
idx <- which(paste0(df$variable[-nrow(df)], df$variable[-1]) == "10") + 1
df$y[idx]
Here is an approach with tidyverse:
library(tidyverse)
df %>%
as.tibble %>%
mutate(y1 = ifelse(lag(variable) == 1 & variable == 0, y, NA)) %>%
pull(y1)
#output
[1] NA NA 30 NA 50
and in base R:
ifelse(c(NA, df[-nrow(df),2]) == 1 & df[, 2] == 0, df[, 3], NA)
if the lag of variable is 1 and the variable is 0 then return y, else return NA.
If you would like to remove the NA. wrap it in na.omit
I have a matrix(100*120) and I am trying to find values <=-1 in each row for every 12 columns. I have tried several times but failed. It is easy to find values which are <= -1, but I do not know how to consider for every 12 columns and store the results for each row. Thanks for any help.
set.seed(100)
Mydata <- sample(x=-3:3,size = 100*120,replace = T)
Mydata <- matrix(data = Mydata,nrow = 100,ncol = 120)
results <- which(Mydata<=-1,arr.ind = T)
You can use the apply function to apply the which function across each column for each row at a time. If I misinterpreted what you wanted, you can adjust the MARGIN argument accordingly.
# MARGIN=1 to apply across rows
dd <- apply(Mydata,MARGIN=1,function(x) which(x <= -1))
dd[1] # which columns in row 1 have a value <= -1
You can do this using a combination of apply functions and seq()
#Example Data
set.seed(100)
Mydata <- sample(x=-3:3,size = 100*120,replace = T)
Mydata <- matrix(data = Mydata,nrow = 100,ncol = 120)
#Solution:
Myseq <- sapply(0:9,function(x) seq(1,12,1) + 12*x)
sapply(1:dim(Myseq)[2], function(x) which(Mydata[,Myseq[,x]] == -1))
This results in a list with:
each subset of the list representing one of your 10 groups of 12 columns
each value under each subset representing the position in the matrix of any value in those 12 columns with a value equal to -1.