How to bind columns and rename at the same time? - r

I want to save residuals from a linear model to a dataframe. I was trying to do it with the line of code (note that this was supposed to go inside a loop):
resi <- NULL
resi <- cbind(resi, colnames(dados[1])=residuals(m))
Here I intended to save the residuals vector from my model m under the same column name from the dados object (which is basicaly a date), but I get the error:
Error: unexpected '=' in "resi <- cbind(resi, colnames(dados[1])="

You want `colnames <- ()`.
cbind(d, `colnames<-`(d, letters[1:4]))
# X1 X2 X3 X4 a b c d
# 1 1 4 7 10 1 4 7 10
# 2 2 5 8 11 2 5 8 11
# 3 3 6 9 12 3 6 9 12
It's similar to setNames() but also compatible with matrices.
Toydata
d <- data.frame(matrix(1:12, 3, 4))

It is possible to do this in tibble
library(tibble)
tibble(resi, !!colnames(dados)[1] :=residuals(m))

Related

write a function in R to loop through an equation and use last value for next round in loop

Using R 3.3.2.
My equation is:
Lt_1<-Lt +(Linf-Lt)*(1-exp(-K))
Values for equation parameters:
Lt<-40 #only value that changes
Linf<-139.086
K<-0.413
year=c(0:10)
I want to loop through the equation for the length of year but need to have Lt start as 40, but then take the value of Lt_1 from the last time the equation was calculated.
What I have tried:
#dataframe to old output
new<- as.data.frame(matrix( 0, nrow=length(year), ncol= 2))
predict_length<-for(i in seq_along(year)){
Lt_1<-Lt +(Linf-Lt)*(1-exp(-K))
new[1,2]<-Lt
new[i,2]<-Lt_1
new[i,1]<-data[i]
}
new
Output:
V1 V2
1 0 40.00000
2 1 73.52453
3 2 73.52453
4 3 73.52453
5 4 73.52453
6 5 73.52453
7 6 73.52453
8 7 73.52453
9 8 73.52453
10 9 73.52453
11 10 73.52453
The loop isn't working - the second LT_1 is repeated for the remainder of the data frame.
Consider using Reduce, the common higher order function found in other languages, since you are essentially nesting equation calls together:
eq <- function(Lt) Lt + (Linf-Lt)*(1-exp(-K))
Lt_1 <- eq(Lt)
Lt_2 <- eq(eq(Lt))
Lt_3 <- eq(eq(eq(Lt)))
Hence, wrap Reduce inside an sapply that iteratively with rep passes the input Lt values at successively increasing times:
new <- as.data.frame(matrix( 0, nrow=length(year), ncol= 2))
new$V1 <- year
new$V2 <- sapply(year, function(i)
Reduce(function(x, y) eq(x), rep(40, i+1)))
new
# V1 V2
# 1 0 40.00000
# 2 1 73.52453
# 3 2 95.70645
# 4 3 110.38339
# 5 4 120.09456
# 6 5 126.52008
# 7 6 130.77161
# 8 7 133.58468
# 9 8 135.44598
# 10 9 136.67754
# 11 10 137.49241
The main question OP had raised is about reason why loop is not working for him. The main reason is that for all calculations Lt has been used in formula.
The two changes will be needed in for-loop:
1. Declare Lt_1 out of for-loop
2. Use Lt-1 inplace of Lt1 in formula.
The modified code will be:
Lt<-40 #only value that changes
Linf<-139.086
K<-0.413
year=c(0:10)
#dataframe to old output
new<- as.data.frame(matrix( 0, nrow=length(year), ncol= 2))
new[1,1]<- 0
new[1,2]<- Lt
Lt_1 <- Lt;
for(i in 2:length(year)){
Lt_1<-Lt_1 +(Linf-Lt_1)*(1-exp(-K))
new[i,2]<-Lt_1
new[i,1]<- i
}
new
# V1 V2
#1 0 40.00000
#2 2 73.52453
#3 3 95.70645
#4 4 110.38339
#5 5 120.09456
#6 6 126.52008
#7 7 130.77161
#8 8 133.58468
#9 9 135.44598
#10 10 136.67754
#11 11 137.49241
Here's a quick and dirty solution:
lt0 <- 40
cfunc <- function(lt){
linf <- 139.086
k <- 0.413
lt1 <- lt + (linf-lt)*(1-exp(-k))
}
year = 0:10
val <- c(cfunc(lt0), rep(0, length(year)-1))
for(i in 2:length(year)){
val[i] <- cfunc(val[i-1])
}
output:
> val
[1] 73.52453 95.70645 110.38339 120.09456 126.52008 130.77161 133.58468 135.44598 136.67754 137.49241
[11] 138.03158
There ought to be a better way than using a for loop

Iterated plotting from list of list of dataframes

I'm trying to explore a large dataset, both with data frames and with charts. I'd like to analyze the distribution of each variable by different metrics (e.g., sum(x), sum(x*y)) and for different sub-populations. I have 4 sub-populations, 2 metrics, and many variables.
In order to accomplish that, I've made a list structure such as this:
$variable1
...$metric1 <--- that's a df.
...$metric2
$variable2
...$metric1
...$metric2
Inside one of the data_frames (e.g., list$variable1$metric1), I've calculated distributions of the unique values for variable1 and for each of the four population groups (represented in columns). It looks like this:
$variable1$metric1
unique_values med_all med_some_not_all med_at_least_some med_none
1 (1) 12-17 Years Old NA NA NA NA
2 (2) 18-25 Years Old 0.278 0.317 0.278 0.317
3 (3) 26-34 Years Old 0.225 0.228 0.225 0.228
4 (4) 35 or Older 0.497 0.456 0.497 0.456
$variable1$metric2
unique_values med_all med_some_not_all med_at_least_some med_none
1 (1) 12-17 Years Old NA NA NA NA
2 (2) 18-25 Years Old 0.544 0.406 0.544 0.406
3 (3) 26-34 Years Old 0.197 0.310 0.197 0.310
4 (4) 35 or Older 0.259 0.284 0.259 0.284
What I'm trying to figure out is a good way to loop through the list of lists (probably melting the DFs in the process) and then output a ton of bar charts. In this case, the natural plot format would be, for each dataframe, a stacked bar chart with one stacked bar for each sub-population, grouping by the variable's unique values.
But I'm not familiar with iterated plotting and so I've hit a dead end. How might I plot from that list structure? Alternately, is there a better structure in which i should be storing this information?
I find nested lists to be pretty tricky to work with, so I would combine them all into a single data frame that labels the name of the variable and the name of the metric:
lst <- list(alpha= list(a= data.frame(matrix(1:4, 2)), b= data.frame(matrix(6:9, 2))), beta = list(c = data.frame(matrix(11:14, 2))))
level1 <- lapply(lst, function(x) do.call(rbind, lapply(names(x), function(y) {x[[y]]$metric=y ; x[[y]]})))
dat <- do.call(rbind, lapply(names(level1), function(x) {level1[[x]]$variable=x ; level1[[x]]}))
dat
# X1 X2 metric variable
# 1 1 3 a alpha
# 2 2 4 a alpha
# 3 6 8 b alpha
# 4 7 9 b alpha
# 5 11 13 c beta
# 6 12 14 c beta
Now you can use standard tools for manipulating a single data frame to perform your data analysis.
here's a start:
lst <- list(alpha= list(a= data.frame(matrix(1:4, 2)), b= data.frame(matrix(6:11, 2))),
beta = list(c = data.frame(matrix(11:14, 2))))
lst
$alpha
$alpha$a
X1 X2
1 1 3
2 2 4
$alpha$b
X1 X2 X3
1 6 8 10
2 7 9 11
$beta
$beta$c
X1 X2
1 11 13
2 12 14
#We can subset by number or by name
lst[['alpha']]
$a
X1 X2
1 1 3
2 2 4
$b
X1 X2 X3
1 6 8 10
2 7 9 11
lst[[1]]
$a
X1 X2
1 1 3
2 2 4
$b
X1 X2 X3
1 6 8 10
2 7 9 11
#The dollar sign naming convention reminds us that we are looking at a list.
#Let's sum the columns of both data frames in the alpha list
lapply(lst[['alpha']], colSums)
$a
X1 X2
3 7
$b
X1 X2 X3
13 17 21
Let's try to find the sum of each column of each data frame:
lapply(lst, colSums)
Error in FUN(X[[i]], ...) :
'x' must be an array of at least two dimensions
What happened? R is correctly refusing to run an array function on a list. The function colSums needs to be fed data frames, matrices, and other arrays above one-dimension. We have to nest an lapply function inside of another one. The logic can get complicated:
lapply(lst, function(x) lapply(x, colSums))
$alpha
$alpha$a
X1 X2
3 7
$alpha$b
X1 X2 X3
13 17 21
$beta
$beta$c
X1 X2
23 27
We can use rbind to put data.frames together:
rbind(lst$alpha$a, lst$beta$c)
X1 X2
1 1 3
2 2 4
3 11 13
4 12 14
Be sure not to do it the way you might be thinking (I've done it many times):
do.call(rbind, lst)
a b
alpha List,2 List,3
beta List,2 List,2
That isn't the result you're looking for. And make sure that the dimensions and column names are the same:
do.call(rbind, lst[[1]])
Error in rbind(deparse.level, ...) :
numbers of columns of arguments do not match
R is refusing to combine data frames that have 2 columns in one (alpha$a) and three columns in the other (alpha$b).
I changed the lst to make alpha$b have two columns like the others and combined them:
bind1 <- lapply(lst2, function(x) do.call(rbind, x))
bind1
$alpha
X1 X2
a.1 1 3
a.2 2 4
b.1 6 9
b.2 7 10
b.3 8 11
$beta
X1 X2
c.1 11 13
c.2 12 14
That combines the elements of each list. Now I can combine the outer list to make one big data frame.
do.call(rbind, bind1)
X1 X2
alpha.a.1 1 3
alpha.a.2 2 4
alpha.b.1 6 9
alpha.b.2 7 10
alpha.b.3 8 11
beta.c.1 11 13
beta.c.2 12 14
Here's a strategy based on melting a list (recursively),
lst = list(alpha= list(a= data.frame(matrix(1:4, 2)),
b= data.frame(matrix(6:11, 2))),
beta = list(c = data.frame(matrix(11:14, 2))))
library(reshape2)
m = melt(lst, id=1:2)
library(ggplot2)
ggplot(m, aes(X1,X2)) + geom_bar(stat="identity") + facet_grid(L1~L2)

Place on specific positions of dataframes specific values

DF <- data.frame(x1=c(NA,7,7,8,NA), x2=c(1,4,NA,NA,4)) # a data frame with NA
WhereAreMissingValues <- which(is.na(DF), arr.ind=TRUE) # find the position of the missing values
Modes <- apply(DF, 2, function(x) {which(tabulate(x) == max(tabulate(x)))}) # find the modes of each column
DF
WhereAreMissingValues
Modes
I would like to replace the NAs of each column of DF with the mode, accordingly.
Please for some help.
Map provides here a one line solution:
data.frame(Map(function(u,v){u[is.na(u)]=v;u},DF, Modes))
# x1 x2
#1 7 1
#2 7 4
#3 7 4
#4 8 4
#5 7 4
Here's how I would do this.
First I'll define an helper function
Myfunc <- function(x) as.numeric(names(sort(-table(x)))[1L])
Then just use lapply over the data set
DF[] <- lapply(DF, function(x){x[is.na(x)] <- Myfunc(x) ; x})
DF
# x1 x2
# 1 7 1
# 2 7 4
# 3 7 4
# 4 8 4
# 5 7 4

R Dataframe comparison which, scaling bad

The idea is extracting the position of df charactes with a reference of other df, example:
L<-LETTERS[1:25]
A<-c(1:25)
df<-data.frame(L,A)
Compare<-c(LETTERS[sample(1:25, 25)])
df[] <- lapply(df, as.character)
for (i in 1:nrow(df)){
df[i,1]<-which(df[i,1]==Compare)
}
head(df)
L A
1 14 1
2 12 2
3 2 3
This works good but scale very bad, like all for, any ideas with apply, or dplyr?
Thanks
Just use match
Your data (use set.seed when providing data using sample)
df <- data.frame(L = LETTERS[1:25], A = 1:25)
set.seed(1)
Compare <- LETTERS[sample(1:25, 25)]
Solution
df$L <- match(df$L, Compare)
head(df)
# L A
# 1 10 1
# 2 23 2
# 3 12 3
# 4 11 4
# 5 5 5
# 6 21 6

Putting output in R into excel

Guys I have a code that generates 2 columns of data (e.g Number, Median) which refers to a particular person...but I have taken samples of 7 people
so basically I get this output:
[[1]
Number Median
1 5
2 3
.....
[[2]]
Number Median
1 6
2 4
....
[[3]]
Number Median
1 3
2 5
So I basically get this output....up til [[7]]
I tried transferring this output in excel using this code
write.csv(cbind(data),"data1.csv")
and I get this type of output:
list(c(Median =.......It lists all the median on the rows
But I want it to save the data referring to the 'median' and 'Number' in columns NOT ROWS
If I just type
write.csv(data,"data1.csv")
I get an error
arguments imply differing number of rows: 157, 179, 178, 180
As Marius said, you have a list of data.frames which can't be written to a .csv file. You need to do:
NewDataFrame <- do.call("rbind", YourList)
write.csv(NewDataFrame, "Data.csv")
do.call takes each of the elements from a list and applies whatever function you tell it (in this case rbind) to all of them.
Here are two options. Both use the following sample data:
myList <- list(data.frame(matrix(1:4, ncol = 2)),
data.frame(matrix(3:10, ncol = 2)),
data.frame(matrix(11:14, ncol =2)))
myList
# [[1]]
# X1 X2
# 1 1 3
# 2 2 4
#
# [[2]]
# X1 X2
# 1 3 7
# 2 4 8
# 3 5 9
# 4 6 10
#
# [[3]]
# X1 X2
# 1 11 13
# 2 12 14
Option 1: Write a csv file where the data.frames are presented as they are in the list
sink("list_of_dataframes.csv", type="output")
invisible(lapply(myList, function(x) dput(write.csv(x))))
sink()
If you open the resulting "list_of_dataframes.csv" file in a text editor, you will get something that looks like this. When you read this into a spreadsheet program, the first column will include the rownames and NULL separating each data.frame:
"","X1","X2"
"1",1,3
"2",2,4
NULL
"","X1","X2"
"1",3,7
"2",4,8
"3",5,9
"4",6,10
NULL
"","X1","X2"
"1",11,13
"2",12,14
NULL
Option 2: Write or search around for a version of cbind that accommodates binding data.frames with differing number of rows.
Here is one such function that I've written.
cbind2 <- function(datalist) {
nrows <- max(sapply(datalist, nrow))
expandmyrows <- function(mydata, rowsneeded) {
temp1 = names(mydata)
rowsneeded = rowsneeded - nrow(mydata)
temp2 = setNames(data.frame(
matrix(rep(NA, length(temp1) * rowsneeded),
ncol = length(temp1))), temp1)
rbind(mydata, temp2)
}
do.call(cbind, lapply(datalist, expandmyrows, rowsneeded = nrows))
}
And here is that function applied to your list:
cbind2(myList)
# X1 X2 X1 X2 X1 X2
# 1 1 3 3 7 11 13
# 2 2 4 4 8 12 14
# 3 NA NA 5 9 NA NA
# 4 NA NA 6 10 NA NA
That output should be easy for you to use with write.csv and related functions.

Resources