I have a data and within that data I want to develop a model with the values selected using a sequence. In my computation, I want i and j to be automatically change like, when the sequence under i changes from seq (1, 18, 2), to seq (2, 19, 2), (3,20,2), (4, 21, 2)…..(9, 26,2) and j change from (19, 27) to (20, 27), (21, 27), (22, 27)……(27, 27) respectively , and at the same time in the loop the argument obs = c (i, 18), should be changed in to c(i,19), c(i, 20) .....c(i, 26) and I have tried the following but I have to change i and the first value of j manually at each step and I need your usual cooperation!
for (i in seq (1, 18, 2)) {
for (j in seq (19,27)) {
output <- arguments (……., obs = c (i, 18), pred = c (j, j+1))
}
}
But I have to change the i and j in the argument in the sequence manually, I want it to be changed automatically by r in the loop! any help, please?
Here is one option with Map
Map(seq, 1:9, 18:26, MoreArgs = list(by = 2))
If we want to automatically change the loop, values, then we could use a function
f1 <- function(input1, input2, input3, by) {
s1 <- seq(input1, input2, by = by)
s2 <- seq(input2 +1, input3)
output <- c()
for(i in s1) {
for(j in s2) {
output <- c(output, somefunction)
}
}
}
and then we call it as
f1(1, 18, 2, 27)
And applying this on multiple values
Map(f1, 1:9, 18:26, 2, 27)
Related
I wrote a (not pretty, but working) function to make one long vector from a certain column of my dataframe and add in a certain number of NA's every time the ID changes. Now what I am looking for is a possibility to automatically rename the variable array within the function so the output of the function carries an individual name (to make it easy to identify which values are in there and to prevent it from getting overwritten when running the function for a different column). One possibility would be to rename it with x or array_x. Now what I tried is several variations of this:
c("array_", as.character(x)) <- array
rm(array)
print(c("array_", as.character(x)))
But it only throws errors- I assume because the string is not recognized as a variable name. Can anyone help me on solving this?
Here is some example data and the part of the function that is already running:
ID <- c(rep ("A", 3), rep("B", 3))
Day <- c(1,2,3,1,2,3)
Score1 <- c(12,4, 16, 9, 12, 13)
Score2 <- c(1, 4, 4, 1, 3, 5)
Score3 <- c(23, 19, 12, 12, 24, 11)
df <- data.frame(ID, Day, Score1, Score2, Score3)
print(df)
foo <- function(x) {
array <- c(df[1,x])
for (i in 2:nrow(df))
{
if (df[i, 1] == df[i-1, 1 ]) {
array <- append (array, df[i, x])
}
else
{
array <- append (array, rep (NA, 5))
array <- append (array, df[i, x])
}
}
#rename array
print (array)
}
foo("Score1")
I am still new to R and need your help.
I have data in the form of an array, and I want to create a function that applies to a dataframe, to get 3 values from each row and use them as indices to get values from the list .
The function should take three values to identify the value in the array
select_value <- function(A, B, C) {
result <-array_main[A, B, C]
return (result)
}
and the array is
vector1 <- c(10, 20, 30, 40, 50, 60, 70, 80, 90)
vector2 <- c(100, 200, 300, 400, 500, 600, 700, 800, 900)
vector3 <- c(2, 4, 6, 8, 10, 12, 14, 16, 18)
array_main <- array(c(vector1, vector2, vector3), dim = c(3, 3, 3))
df <- data.frame (C1 = c(1, 2, 3), C2 = c(2, 1, 3), C3 = c(3, 2, 1))
So the function should take for example the first row from df [row values are (1,2,3)] and use that as indices for array_main to get array_main[1,2,3] and return the value 8, then the second row from df which is (2, 1, 2), so use that as indices for array_main to get array_main[2, 1, 2] and return the value 200, finally, the third row which is (3,3,1), so use that as indices for array_main to get array_main[3,3,1] and return the value 90 .
How can I get the values stored in the (result) variable as a vector?
Thank you so much for your help
You don't really need the select_value function at all. You can do the entire thing like this:
array_main[as.matrix(df)]
#> [1] 8 200 90
Learning points
A couple of other points to note are than your function select_value is longer than it needs to be. Writing:
select_value <- function(A, B, C) {
result <-array_main[A, B, C]
return (result)
}
Is the same as writing
select_value <- function(A, B, C) {
array_main[A, B, C]
}
The second point to note is that it isn't a great idea to write a function that relies on data from outside the function which isn't being passed in as an argument. A better way to write such a function would be:
select_value <- function(data, A, B, C) {
data[A, B, C]
}
However, when we do that, we realise that select_value is essentially the same function as the square bracket itself. The above function is almost identical to defining select_value as:
select_value <- `[`
Perhaps more useful would be defining your function like this:
select_values <- function(data_array, index_df) {
data_array[as.matrix(index_df)]
}
Which, in your case, would produce:
select_values(array_main, df)
#> [1] 8 200 90
Long time reader, first time poster. I have not found any previous questions about my current problem. I would like to create multiple linear functions, which I can later apply to variables. I have a data frame of slopes: df_slopes and a data frame of constants: df_constants.
Dummy data:
df_slope <- data.frame(var1 = c(1, 2, 3,4,5), var2 = c(2,3,4,5,6), var3 = c(-1, 1, 0, -10, 1))
df_constant<- data.frame(var1 = c(3, 4, 6,7,9), var2 = c(2,3,4,5,6), var3 = c(-1, 7, 8, 0, -1))
I would like to construct functions such as
myfunc <- function(slope, constant, trvalue){
result <- trvalue*slope+constant
return(result)}
where the slope and constant values are
slope<- df_slope[i,j]
constant<- df_constant[i,j]
I have tried many ways, for example like this, creating a dataframe of functions with for loop
myfunc_all<-data.frame()
for(i in 1:5){
for(j in 1:3){
myfunc_all[i,j]<-function (x){ x*df_slope[i,j]+df_constant[i,j] }
full_func[[i]][j]<- func_full
}
}
without success. The slope-constant values are paired up, such as df_slope[i,j] is paired with df_constant[i,j]. The desired end result would be some kind of data frame, from where I can call a function by giving it the coordinates, for example like this:
myfunc_all[i,j}
but any form would be great. For example
myfunc_all[2,1]
in our case would be
function (x){ x*2+4]
which I can apply to different x values. I hope my problem is clear.
So you have a slight problem with lazy evaluation and variable scopes when you are using a for loop to build functions (see here for more info). It's a bit safer to use something like mapply which will create closures for you. Try
myfunc_all <- with(expand.grid(1:5, 1:3), mapply(function(i, j) {
function(x) {
x*df_slope[i,j]+df_constant[i,j]
}
},Var1, Var2))
dim(myfunc_all) <- c(5,3)
This will create an array like object. The only difference is that you need to use double brackets to extract the function. For example
myfunc_all[[2,1]](0)
# [1] 4
myfunc_all[[5,3]](0)
# [1] -1
Alternative you can choose to write a function that returns a function. That would look like
myfunc_all <- (function(slopes, constants) {
function(i, j)
function(x) x*slopes[i,j]+constants[i,j]
})(df_slope, df_constant)
then rather than using brackets, you call the function with parenthesis.
myfunc_all(2,1)(0)
# [1] 4
myfunc_all(5,3)(0)
# [1] -1
df_slope <- data.frame(var1 = c(1, 2, 3,4,5), var2 = c(2,3,4,5,6), var3 = c(-1, 1, 0, -10, 1))
df_constant<- data.frame(var1 = c(3, 4, 6,7,9), var2 = c(2,3,4,5,6), var3 = c(-1, 7, 8, 0, -1))
functions = vector(mode = "list", length = nrow(df_slope))
for (i in 1:nrow(df_slope)) {
functions[[i]] = function(i,x) { df_slope[i]*x + df_constant[i]}
}
f = function(i, x) {
functions[[i]](i, x)
}
f(1, 1:10)
f(3, 5:10)
I am working on an assignment for school. I need to transform the columns in a data frame using a for loop and the bcPower function from the cars package. My data frame named bb2.df consists of 13 columns of baseball statistics for 337 players. The data is from:
http://ww2.amstat.org/publications/jse/datasets/baseball.dat.txt
I read the data in using:
bb.df <- read.fwf("baseball.dat.txt",widths=c(4,6,6,4,4,3,3,3,4,4,4,3,3,2,2,2,2,19))
And then I created a second data frame just for the numeric stats using:
bb2.df <- bb.df[,1:13]
library(cars)
Then I unsuccessfully tried to build the for loop.
> bb2.df[[i]] <- bcPower(bb2.df[[i]],c)
> for (i in 1:ncol(bb2.df)) {
+ c <- coef(powerTransform(bb2.df[[i]]))
+ bb2.df[[i]] <- bcPower(bb2.df[[i]],c)
+ }
Error in bc1(out[, j], lambda[j]) :
First argument must be strictly positive.
The loop seems to transform the first three columns but stops.
What am I doing wrong?
This solution
tests whether a column appears to contain logical values and omits them from the transformation
replaces zero values in the vectors with a small number, outside the range of the actual values
stores the transformed values in a new data frame, retaining the column and row names
I have also tested all of the variables for normality before and after the transformation. I tried to find a variable that's interesting in that the transformed variable has a large p-value for the Shapiro test, but also there there was a large change in the p-value. Finally, the interesting variable is scaled in both the original and transformed version, and the two versions are overlaid on a density plot.
library(car); library(ggplot2); library(reshape2)
# see this link for column names and type hints
# http://ww2.amstat.org/publications/jse/datasets/baseball.txt
# add placeholder column for opening quotation mark
bb.df <-
read.fwf(
"http://ww2.amstat.org/publications/jse/datasets/baseball.dat.txt",
widths = c(4, 6, 6, 4, 4, 3, 3, 3, 4, 4, 4, 3, 3, 2, 2, 2, 2, 2, 17)
)
# remove placeholder column
bb.df <- bb.df[,-(ncol(bb.df) - 1)]
names(bb.df) <- make.names(
c(
'Salary', 'Batting average', 'OBP', 'runs', 'hits', 'doubles', 'triples',
'home runs', 'RBI', 'walks', 'strike-outs', 'stolen bases', 'errors',
"free agency eligibility", "free agent in 1991/2" ,
"arbitration eligibility", "arbitration in 1991/2", 'name'
)
)
# test for boolean/logical values... don't try to transform them
logicals.test <- apply(
bb.df,
MARGIN = 2,
FUN = function(one.col) {
asnumeric <- as.numeric(one.col)
aslogical <- as.logical(asnumeric)
renumeric <- as.numeric(aslogical)
matchflags <- renumeric == asnumeric
cant.be.logical <- any(!matchflags)
print(cant.be.logical)
}
)
logicals.test[is.na(logicals.test)] <- FALSE
probably.numeric <- bb.df[, logicals.test]
result <- apply(probably.numeric, MARGIN = 2, function(one.col)
{
# can't transform vectors containing non-positive values
# replace zeros with something small
non.zero <- one.col[one.col > 0]
small <- min(non.zero) / max(non.zero)
zeroless <- one.col
zeroless[zeroless == 0] <- small
c <- coef(powerTransform(zeroless))
transformation <- bcPower(zeroless, c)
return(transformation)
})
result <- as.data.frame(result)
row.names(result) <- bb.df$name
cols2test <- names(result)
normal.before <- sapply(cols2test, function(one.col) {
print(one.col)
temp <- shapiro.test(bb.df[, one.col])
return(temp$p.value)
})
normal.after <- sapply(cols2test, function(one.col) {
print(one.col)
temp <- shapiro.test(result[, one.col])
return(temp$p.value)
})
more.normal <- cbind.data.frame(normal.before, normal.after)
more.normal$more.normal <-
more.normal$normal.after / more.normal$normal.before
more.normal$interest <-
more.normal$normal.after * more.normal$more.normal
interesting <-
rownames(more.normal)[which.max(more.normal$interest)]
data2plot <-
cbind.data.frame(bb.df[, interesting], result[, interesting])
names(data2plot) <- c("original", "transformed")
data2plot <- scale(data2plot)
data2plot <- melt(data2plot)
names(data2plot) <- c("Var1", "dataset", interesting)
ggplot(data2plot, aes(x = data2plot[, 3], fill = dataset)) +
geom_density(alpha = 0.25) + xlab(interesting)
Original, incomplete answer:
I believe you're trying to do illegal power transformations (vectors including non-positive values, specifically zeros; vectors with no variance)
The fact that you are copying bb.df into bb2.df and then overwriting is a sure sign that you should really be using apply.
This doesn't create a useful dataframe, but it should get you started,
library(car)
bb.df <-
read.fwf(
"baseball.dat.txt",
widths = c(4, 6, 6, 4, 4, 3, 3, 3, 4, 4, 4, 3, 3, 2, 2, 2, 2, 19)
)
bb.df[bb.df == 0] <- NA
# skip last (text) col
for (i in 1:(ncol(bb.df) - 1)) {
print(i)
# use comma to indicate indexing by column
temp <- bb.df[, i]
temp[temp == 0] <- NA
temp <- temp[complete.cases(temp)]
if (length(unique(temp)) > 1) {
c <- coef(powerTransform(bb.df[, i]))
print(bcPower(bb.df[i], c))
} else {
print(paste0("column ", i, " is invariant"))
}
}
# apply solution
result <- apply(bb.df[,-ncol(bb.df)], MARGIN = 2, function(one.col)
{
temp <- one.col
temp[temp == 0] <- NA
temp <- temp[complete.cases(temp)]
if (length(unique(temp)) > 1) {
c <- coef(powerTransform(temp))
transformation <- bcPower(temp, c)
return(transformation)
} else
{
print("skipping invariant column")
return(NULL)
}
})
I'm looking to take the following vector:
v1 = c(2, 5, 7, 9, 1)
I want to run a loop of iterative sampling, placing the values sampled into
a new vector v2 and then break this process when the sum of these values are greater than 12.
This is what I have so far:
v2 = c()
while (sum(v2) > 12) {
sample(v1, 1, replace = FALSE)
if(sum(v2) > 12))
break
}
Not sure if I'm on the right track. Appreciate the help.
I think your syntax has a problem and the use of break makes more sense with a repeat loop:
v1 = c(2, 5, 7, 9, 1)
v2 <- c()
repeat {
v2 <- c(v2, sample(v1[!v1 %in% v2], 1) )
if( sum(v2) > 12 )
break
}
print(v2)
[1] 5 7 9