I have a simple question, so lets take some basic data
a <- rnorm(100, mean=1, sd = 0.1)
b <- rnorm(100, mean=5, sd = 2)
c <- data.frame(a,b)
Now I want to redefine C$B such that if it is below a limit, the user manually defines the new variable it will take, and if it is above this limit, the values take the same as previous
c$b <- with(c, ifelse(b < 2, 1, # leave as exsiting value #))
so when b < 2, we want to assign a value of 1, otherwise use the exisitng value
If we are using ifelse, try
c$b <- with(c, ifelse(b < 2, 1, b))
This doesn't even require ifelse. We can get the logical index of values less than 2 in the 'b' column (c$b <2) and assign those values to 1.
c$b[c$b<2] <- 1
Related
I need to adjust one variable until it satisfies the condition that none of its rows are higher than one specific value. Here is some context:
I have 2 vectors: 'a' and 'b'
I normalize a and b to calculate their ratio 'c' (a_norm/b_norm)
Every row of 'c' must not be higher than a constant 'd'. Any 'c' row that is higher than d should be transformed into d.
After all 'c' rows that need to are adjusted (let's call this new column c_adjusted), I must recalculate a_norm (c_adjusted*b) (note that this will not make a_norm to be normalise, so let's call it a_adjusted)
I normalize a_adjusted to estimate the new a_norm (a_adjusted_norm = a_adjusted/sum(a_adjusted)*100
I calculate again c to check if all rows satisfy the condition after the adjustment. If any is still higher than d, I have to repeat the process until the condition is satisfied. At the end I would like the final a_adjusted_norm as the final result.
Does anybody knows how to achieve this? Here is a reproducible example:
set.seed(8)
#create dataframe
a<- runif(100, min = 0, max = 10)
b<- runif(100, min = 0, max = 10)
a_norm <- a/sum(a)*100
b_norm <- b/sum(b)*100
c <- a_norm / b_norm
c_cap <- 1 #C must not be higher than c_Cap
df <- data.frame(a_norm, b_norm, c)
df <- df %>%
mutate(c_adjusted = ifelse(c >= c_cap, c_cap, c), #We adjust c rows that are higher than c_cap
a_adjusted = c_adjusted*b_norm, #We calculate the adjusted a with adjusted c
a_adjusted_norm = a_adjusted/sum(a_adjusted)*100) #Normalize adjusted a
#We calculate again c to see if it matches condition
df <- df %>%
mutate(c = a_adjusted_norm/b_norm) #see if c satisfy condition after adjusting variables
#If any row of C is still higher than cap, I must adjust it again and repeat the process until all rows match the condition
Thanks in advance!
Generally you can do:
a <- runif(10, min = 0, max = 10)
b <- runif(10, min = 0, max = 10)
a_norm <- a/sum(a)*100
b_norm <- b/sum(b)*100
cap <- 1
c <- a_norm / b_norm
while (max(c) > cap) {
c[c>cap] <- cap
a_adjusted <- c * b_norm
a_adjusted_norm <- a_adjusted/sum(a_adjusted)*100
c <- a_adjusted_norm/b_norm
}
However, this seems to never work, because while your approach shrinks the higher values towards 1, it at the same time pushes smaller values than 1 to become larger than 1. Which means that the loop will never end (at least I stopped it manually after some time)
So you probably need to adjust the formula to recalculate your c values!
I am learning to improve my coding in R. I have this code:
data$score[testA == 1] <- testA_score
data$score[testB==1] <- testB_score
So basically I have four columns that I want to combine into one: testA=1 indicates if the student took version A of the test and testA_score is their score; testB=1 indicates if the student took version B of the test and testB_score is their score. I want to combine this information into new column score.
As well Suppose I had testA, testB through testH. All values are 0 or 1. How can I make new column test_complete which is = 1 if any of the tests are = 1?
Basically as a former Stata user I am looking for the R equivalent commands to egen rowtotal and egenrowfirst. Thanks so much.
you can take max out of all test : since it 1 or 0 values only if at least one test is completed max will be equal to 1
testA <- c(1,0, 0, 1,0,0,0)
testB <- c(0, 1,0, 0, 1,0,1)
testC <- c(0, 0, 0,1, 0, 1, 0)
df <- as.data.frame(cbind(testA, testB, testC))
df$completed <- apply(df[, 1:3], 1, max)
So if I understand correctly, taking the maximum value by row should give what you need:
binary <- c(0,1)
df <- data.frame(
score1 = sample(binary, 20, replace = TRUE),
score2 = sample(binary, 20, replace = TRUE),
score3 = sample(binary, 20, replace = TRUE)
)
df$passed <- apply(df, 1, max)
head(df)
I want to run a partition regression in R, for which I need to assign a factor to indicate which partition this data belongs to. For example, when it is greater than mean+2 standard deviations,I assign the indicator 2, and between mean+1sd and mean+2sd, 1 so on and so forth. I know it can be done by if and else. But when the partitions are way too much, the code seems to be too long. Is there any easy and succinct methods to accomplish it?
mean=mean(x)
sd=sd(x)
if((x[i]-mean)/sd< -3) signal[i]=-3
if((x[i]-mean)/sd> -3) signal[i]=-2
if((x[i]-mean)/sd> -2) signal[i]=-1
if((x[i]-mean)/sd> -1) signal[i]=0
if((x[i]-mean)/sd>1) signal[i]=1
if((x[i]-mean)/sd>2) signal[i]=2
if((x[i]-mean)/sd>3) signal[i]=3
}
Thanks for #jogo and #r.user.05apr.
Now I have a slightly different problem. I want to compute the partition based on rolling windows, 20 days for example, which means I need to scale the data of day t based on the past 20 days (day t-20 to day t-1) and assign the same values as above according to its z score. In such case, can cut function still be used? I have written a code with a loop and if sentences
signal <- vector()
n=20 #window
for(i in (n+1):length(x)){
mean=mean(x[(n-20):(n-1)])
sd=sd(x[(i-20):(i-1)])
if((x[i]-mean)/sd< -3) signal[i]=-3
if((x[i]-mean)/sd> -3) signal[i]=-2
if((x[i]-mean)/sd> -2) signal[i]=-1
if((x[i]-mean)/sd> -1) signal[i]=0
if((x[i]-mean)/sd>1) signal[i]=1
if((x[i]-mean)/sd>2) signal[i]=2
if((x[i]-mean)/sd>3) signal[i]=3
}
You can use cut()
x <- iris$Petal.Length
m <- mean(x)
s <- sd(x)
cut((x - m)/s, breaks = c(-Inf, -3, -2, -1, 1, 2, 3, +Inf), labels = c((-3):3))
to coerce to numeric:
as.numeric(as.character(cut((x - m)/s, breaks = c(-Inf, -3, -2, -1, 1, 2, 3, +Inf), labels = c((-3):3))))
remark:
You can shorten (x - m)/s to scale(x)
Depends on how dynamic the value assignment has to be. Alternative option:
criteria <- data.frame(operator = c("<", rep(">", 6)),
criterion = c(-3, seq(-3, -1, 1), 1:3),
result = c(seq(-3, 0, 1), 1:3),
stringsAsFactors = FALSE)
criteria # data frame with individual conditions for if
get_signal <- function(mean, sd, x) {
dummy <- (x-mean)/sd
for (i in (1:nrow(criteria))) {
if (do.call(criteria[i, 1], list(dummy, criteria[i, 2]))) res <- criteria[i, 3]
}
res
}
sapply(-5:10, function(x) get_signal(2, 1, x))
I've written a simple correlation function that takes in three variables. "A" and "B" are numerical vectors of equal length, and "n" is the length.
Corr.fxn <- function(A, B, n){
Correlation <- (sum((A - mean(A))*(B - mean(B))) / (n-1)) / (sd(A)*sd(B))
return(Correlation)
}
The function works well enough, but I have many vectors I want to process. What's the best way to modify this code to process all "N take 2" unique analyses for my set of vectors "N"?
EDIT:
Example data showing the structure of the vectors:
A <- c(-1, 0, 1, -1, 0, 1, -1, 0, 1)
B <- c(1, 1, -1, 0, 1, -1, 0, 0, 1)
...
n <- length(A)
So let's say I have vectors A through Z and I want to modify my code to output a new vector containing all {26 take 2} correlation values.
Here is one possible way you can do it assuming you have a bunch of numeric vectors in a list v as follows:
v <- list()
for (i in 1:10) {
v[[i]] <- sample(1:10, 10, replace = TRUE)
}
apply(combn(1:10, 2), 2, function(x) Corr.fxn(v[[x[1]]], v[[x[2]]], length(v[[x[1]]])))
In this answer, I assume 2 things. First, you want to write a function yourself, since otherwise you can use Hmisc::rcorr. Second, you want the "N take 2" part to be inside the function, otherwise the ways suggested earlier are correct. In that case, you can do this:
Corr.fxn <- function(vectors, n){
pairs<- combn(length(vectors), 2)
npairs<- ncol(pairs)
cor.mat<- matrix(NA, nrow = length(vectors), ncol = npairs)
for (i in 1:ncol(pairs)){
A<- vectors[[pairs[1, i]]]
B<- vectors[[pairs[2, i]]]
cor.mat[pairs[1, i], pairs[2, i]] <- (sum((A - mean(A))*(B - mean(B))) / (n-1)) /(sd(A)*sd(B))
}
cor.mat[lower.tri(cor.mat)]<- cor.mat[upper.tri(cor.mat)] ###
diag(cor.mat)<- 1 ###
cor.mat<- data.frame(cor.mat) ###
row.names(cor.mat)<- colnames(cor.mat)<- names(vectors) ###
return(cor.mat)
}
The lines that end in ### are there for decorative reasons. The main input is a list called "vectors". So it works as follows:
A<- runif(100, 1, 100)
B<- runif(100, 30, 50)
C<- runif(100, 120, 200)
> Corr.fxn(list(A=A, B=B, C=C), n=100)
A B C
A 1.0000000 -0.11800104 -0.13980458
B -0.1180010 1.00000000 0.04933581
C -0.1398046 0.04933581 1.00000000
I'm certain there is an easier way to accomplish this. I have the following dataframe:
B <- c(1, 1, 1, 0, 1, 2, 2, 0, 0, 0)
A <- c(1:10)
df <- as.data.frame(cbind(A,B))
What I would like to do is add a third column (C) that applies column B, unless column B is 0, in which case apply the percent change in column A to the previous result of column C.
Here is what I did:
library(Hmisc)
df$New <- ifelse(df$B!=0, df$B, df$A/Lag(df$A, shift=1)*Lag(df$B, shift=1))
df$New2 <- ifelse(df$New !=0, df$New, df$A/Lag(df$A, shift=1)*Lag(df$New, shift=1))
df$New3 <- ifelse(df$New2 !=0, df$New2, df$A/Lag(df$A, shift=1)*Lag(df$New2, shift=1))
df$C <- pmax(df$New, df$New2, df$New3)
df<- df[c(1,2,6)]
Essentially, I need to calculate on the column based on the previous calculated result, so maybe sapply, but not sure.