I wrote a (not pretty, but working) function to make one long vector from a certain column of my dataframe and add in a certain number of NA's every time the ID changes. Now what I am looking for is a possibility to automatically rename the variable array within the function so the output of the function carries an individual name (to make it easy to identify which values are in there and to prevent it from getting overwritten when running the function for a different column). One possibility would be to rename it with x or array_x. Now what I tried is several variations of this:
c("array_", as.character(x)) <- array
rm(array)
print(c("array_", as.character(x)))
But it only throws errors- I assume because the string is not recognized as a variable name. Can anyone help me on solving this?
Here is some example data and the part of the function that is already running:
ID <- c(rep ("A", 3), rep("B", 3))
Day <- c(1,2,3,1,2,3)
Score1 <- c(12,4, 16, 9, 12, 13)
Score2 <- c(1, 4, 4, 1, 3, 5)
Score3 <- c(23, 19, 12, 12, 24, 11)
df <- data.frame(ID, Day, Score1, Score2, Score3)
print(df)
foo <- function(x) {
array <- c(df[1,x])
for (i in 2:nrow(df))
{
if (df[i, 1] == df[i-1, 1 ]) {
array <- append (array, df[i, x])
}
else
{
array <- append (array, rep (NA, 5))
array <- append (array, df[i, x])
}
}
#rename array
print (array)
}
foo("Score1")
Related
Long time reader, first time poster. I have not found any previous questions about my current problem. I would like to create multiple linear functions, which I can later apply to variables. I have a data frame of slopes: df_slopes and a data frame of constants: df_constants.
Dummy data:
df_slope <- data.frame(var1 = c(1, 2, 3,4,5), var2 = c(2,3,4,5,6), var3 = c(-1, 1, 0, -10, 1))
df_constant<- data.frame(var1 = c(3, 4, 6,7,9), var2 = c(2,3,4,5,6), var3 = c(-1, 7, 8, 0, -1))
I would like to construct functions such as
myfunc <- function(slope, constant, trvalue){
result <- trvalue*slope+constant
return(result)}
where the slope and constant values are
slope<- df_slope[i,j]
constant<- df_constant[i,j]
I have tried many ways, for example like this, creating a dataframe of functions with for loop
myfunc_all<-data.frame()
for(i in 1:5){
for(j in 1:3){
myfunc_all[i,j]<-function (x){ x*df_slope[i,j]+df_constant[i,j] }
full_func[[i]][j]<- func_full
}
}
without success. The slope-constant values are paired up, such as df_slope[i,j] is paired with df_constant[i,j]. The desired end result would be some kind of data frame, from where I can call a function by giving it the coordinates, for example like this:
myfunc_all[i,j}
but any form would be great. For example
myfunc_all[2,1]
in our case would be
function (x){ x*2+4]
which I can apply to different x values. I hope my problem is clear.
So you have a slight problem with lazy evaluation and variable scopes when you are using a for loop to build functions (see here for more info). It's a bit safer to use something like mapply which will create closures for you. Try
myfunc_all <- with(expand.grid(1:5, 1:3), mapply(function(i, j) {
function(x) {
x*df_slope[i,j]+df_constant[i,j]
}
},Var1, Var2))
dim(myfunc_all) <- c(5,3)
This will create an array like object. The only difference is that you need to use double brackets to extract the function. For example
myfunc_all[[2,1]](0)
# [1] 4
myfunc_all[[5,3]](0)
# [1] -1
Alternative you can choose to write a function that returns a function. That would look like
myfunc_all <- (function(slopes, constants) {
function(i, j)
function(x) x*slopes[i,j]+constants[i,j]
})(df_slope, df_constant)
then rather than using brackets, you call the function with parenthesis.
myfunc_all(2,1)(0)
# [1] 4
myfunc_all(5,3)(0)
# [1] -1
df_slope <- data.frame(var1 = c(1, 2, 3,4,5), var2 = c(2,3,4,5,6), var3 = c(-1, 1, 0, -10, 1))
df_constant<- data.frame(var1 = c(3, 4, 6,7,9), var2 = c(2,3,4,5,6), var3 = c(-1, 7, 8, 0, -1))
functions = vector(mode = "list", length = nrow(df_slope))
for (i in 1:nrow(df_slope)) {
functions[[i]] = function(i,x) { df_slope[i]*x + df_constant[i]}
}
f = function(i, x) {
functions[[i]](i, x)
}
f(1, 1:10)
f(3, 5:10)
I have a data and within that data I want to develop a model with the values selected using a sequence. In my computation, I want i and j to be automatically change like, when the sequence under i changes from seq (1, 18, 2), to seq (2, 19, 2), (3,20,2), (4, 21, 2)…..(9, 26,2) and j change from (19, 27) to (20, 27), (21, 27), (22, 27)……(27, 27) respectively , and at the same time in the loop the argument obs = c (i, 18), should be changed in to c(i,19), c(i, 20) .....c(i, 26) and I have tried the following but I have to change i and the first value of j manually at each step and I need your usual cooperation!
for (i in seq (1, 18, 2)) {
for (j in seq (19,27)) {
output <- arguments (……., obs = c (i, 18), pred = c (j, j+1))
}
}
But I have to change the i and j in the argument in the sequence manually, I want it to be changed automatically by r in the loop! any help, please?
Here is one option with Map
Map(seq, 1:9, 18:26, MoreArgs = list(by = 2))
If we want to automatically change the loop, values, then we could use a function
f1 <- function(input1, input2, input3, by) {
s1 <- seq(input1, input2, by = by)
s2 <- seq(input2 +1, input3)
output <- c()
for(i in s1) {
for(j in s2) {
output <- c(output, somefunction)
}
}
}
and then we call it as
f1(1, 18, 2, 27)
And applying this on multiple values
Map(f1, 1:9, 18:26, 2, 27)
I have a for-loop which return 4 different answeres, which is correct, but when I try to retrieve these values to my data.frame I get "Error in [<-.data.frame(*tmp*, p, 1, value = 29.1520685791182) :
missing values are not allowed in subscripted assignments of data frames"
Goal: Im trying to get values, which is printed 29, 485,-14, 12, in a data.frame
library("xts")
library("quantmod")
library("fredr")
Tesla <- getSymbols("TSLA", from=as.Date("2014-11-03"),to=as.Date("2019-11-03"))
Amazon <- getSymbols("AMZN", from=as.Date("2014-11-03"),to=as.Date("2019-11-03"))
Equinor <- getSymbols("EQNR",from="2014-11-03",to="2019-11-03")
FTSE100 <- getSymbols("^FTSE",from="2014-11-03",to="2019-11-03")
dftest <- data.frame(merge(TSLA$TSLA.Close, AMZN$AMZN.Close, EQNR$EQNR.Close,FTSE$FTSE.Close))
df <- data.frame(matrix(nrow = 1, ncol = 4)) #The data.frame where i want my returned values from print(pros) to be in.
colnames(dfProsent) <- c("TESLA", "AMAZON","EQUINOR","FTSE")
for (p in dftest) {
pros <- ((last(as.numeric(p)))-(first(as.numeric(p))))/(first(as.numeric(p)))*100
print(pros) #this print out 29, 485,-14,12
df[p,1] <- pros #the problem
}
I am working on an assignment for school. I need to transform the columns in a data frame using a for loop and the bcPower function from the cars package. My data frame named bb2.df consists of 13 columns of baseball statistics for 337 players. The data is from:
http://ww2.amstat.org/publications/jse/datasets/baseball.dat.txt
I read the data in using:
bb.df <- read.fwf("baseball.dat.txt",widths=c(4,6,6,4,4,3,3,3,4,4,4,3,3,2,2,2,2,19))
And then I created a second data frame just for the numeric stats using:
bb2.df <- bb.df[,1:13]
library(cars)
Then I unsuccessfully tried to build the for loop.
> bb2.df[[i]] <- bcPower(bb2.df[[i]],c)
> for (i in 1:ncol(bb2.df)) {
+ c <- coef(powerTransform(bb2.df[[i]]))
+ bb2.df[[i]] <- bcPower(bb2.df[[i]],c)
+ }
Error in bc1(out[, j], lambda[j]) :
First argument must be strictly positive.
The loop seems to transform the first three columns but stops.
What am I doing wrong?
This solution
tests whether a column appears to contain logical values and omits them from the transformation
replaces zero values in the vectors with a small number, outside the range of the actual values
stores the transformed values in a new data frame, retaining the column and row names
I have also tested all of the variables for normality before and after the transformation. I tried to find a variable that's interesting in that the transformed variable has a large p-value for the Shapiro test, but also there there was a large change in the p-value. Finally, the interesting variable is scaled in both the original and transformed version, and the two versions are overlaid on a density plot.
library(car); library(ggplot2); library(reshape2)
# see this link for column names and type hints
# http://ww2.amstat.org/publications/jse/datasets/baseball.txt
# add placeholder column for opening quotation mark
bb.df <-
read.fwf(
"http://ww2.amstat.org/publications/jse/datasets/baseball.dat.txt",
widths = c(4, 6, 6, 4, 4, 3, 3, 3, 4, 4, 4, 3, 3, 2, 2, 2, 2, 2, 17)
)
# remove placeholder column
bb.df <- bb.df[,-(ncol(bb.df) - 1)]
names(bb.df) <- make.names(
c(
'Salary', 'Batting average', 'OBP', 'runs', 'hits', 'doubles', 'triples',
'home runs', 'RBI', 'walks', 'strike-outs', 'stolen bases', 'errors',
"free agency eligibility", "free agent in 1991/2" ,
"arbitration eligibility", "arbitration in 1991/2", 'name'
)
)
# test for boolean/logical values... don't try to transform them
logicals.test <- apply(
bb.df,
MARGIN = 2,
FUN = function(one.col) {
asnumeric <- as.numeric(one.col)
aslogical <- as.logical(asnumeric)
renumeric <- as.numeric(aslogical)
matchflags <- renumeric == asnumeric
cant.be.logical <- any(!matchflags)
print(cant.be.logical)
}
)
logicals.test[is.na(logicals.test)] <- FALSE
probably.numeric <- bb.df[, logicals.test]
result <- apply(probably.numeric, MARGIN = 2, function(one.col)
{
# can't transform vectors containing non-positive values
# replace zeros with something small
non.zero <- one.col[one.col > 0]
small <- min(non.zero) / max(non.zero)
zeroless <- one.col
zeroless[zeroless == 0] <- small
c <- coef(powerTransform(zeroless))
transformation <- bcPower(zeroless, c)
return(transformation)
})
result <- as.data.frame(result)
row.names(result) <- bb.df$name
cols2test <- names(result)
normal.before <- sapply(cols2test, function(one.col) {
print(one.col)
temp <- shapiro.test(bb.df[, one.col])
return(temp$p.value)
})
normal.after <- sapply(cols2test, function(one.col) {
print(one.col)
temp <- shapiro.test(result[, one.col])
return(temp$p.value)
})
more.normal <- cbind.data.frame(normal.before, normal.after)
more.normal$more.normal <-
more.normal$normal.after / more.normal$normal.before
more.normal$interest <-
more.normal$normal.after * more.normal$more.normal
interesting <-
rownames(more.normal)[which.max(more.normal$interest)]
data2plot <-
cbind.data.frame(bb.df[, interesting], result[, interesting])
names(data2plot) <- c("original", "transformed")
data2plot <- scale(data2plot)
data2plot <- melt(data2plot)
names(data2plot) <- c("Var1", "dataset", interesting)
ggplot(data2plot, aes(x = data2plot[, 3], fill = dataset)) +
geom_density(alpha = 0.25) + xlab(interesting)
Original, incomplete answer:
I believe you're trying to do illegal power transformations (vectors including non-positive values, specifically zeros; vectors with no variance)
The fact that you are copying bb.df into bb2.df and then overwriting is a sure sign that you should really be using apply.
This doesn't create a useful dataframe, but it should get you started,
library(car)
bb.df <-
read.fwf(
"baseball.dat.txt",
widths = c(4, 6, 6, 4, 4, 3, 3, 3, 4, 4, 4, 3, 3, 2, 2, 2, 2, 19)
)
bb.df[bb.df == 0] <- NA
# skip last (text) col
for (i in 1:(ncol(bb.df) - 1)) {
print(i)
# use comma to indicate indexing by column
temp <- bb.df[, i]
temp[temp == 0] <- NA
temp <- temp[complete.cases(temp)]
if (length(unique(temp)) > 1) {
c <- coef(powerTransform(bb.df[, i]))
print(bcPower(bb.df[i], c))
} else {
print(paste0("column ", i, " is invariant"))
}
}
# apply solution
result <- apply(bb.df[,-ncol(bb.df)], MARGIN = 2, function(one.col)
{
temp <- one.col
temp[temp == 0] <- NA
temp <- temp[complete.cases(temp)]
if (length(unique(temp)) > 1) {
c <- coef(powerTransform(temp))
transformation <- bcPower(temp, c)
return(transformation)
} else
{
print("skipping invariant column")
return(NULL)
}
})
I'm trying to append a row to an existing dataframe in R. The dataframe represents a subject and I want to update this with newly (generated) data. When I run this, the index numbers of the dataframe become strange:
1,
2,
21,
211,
2111,
21111, etc.
These are not practicle to read.
How to get 'normal' index numbers? (1, 2, 3, 4, etc.).
x <- 10
y <- 463
dat <- data.frame(x,y)
for (i in 1:10) {
dat.sub <- dat[nrow(dat),] # select the last row from 'dat'
dat.sub <- within(dat.sub, { # within that selection update the objects
x <- x+1
y <- y+1
})
dat <- rbind(dat, dat.sub, deparse.level = 2) # attach updated row to the 'dat'
}
dat
dat[3,]
I think the problem is dat.sub has data.frame class and has the same index number after second row. The easiest way is to change the class of dat.sub without assigning any index. One way is like:
dat.sub <- c(within(dat.sub, { # within that selection update the objects
x <- x+1
y <- y+1
}))
add a c in your for loop, making dat.sub as a vector.