Problems with full_join in R. no applicable method to "character" - r

I am new to R world I am struggling with full_join function. I am pretty sure the problem is easy. I got it working on other situations I assume they were the same as the present one. Anyhow, probably someone can help me. Let's go:
I have several datasets within a big list:
NDVI2003 <- ls(pattern = "x2003_meanNDVI_m.*$")
PixelQa2003 <- ls(pattern = "x2003_meanPixelQa_m.*$")
full_list <- do.call(c, list(NDVI2003,PixelQa2003))
The first 2 functions are just grabbing some files from a folder. This files look like:
> str(x2003_meanNDVI_m1)
'data.frame': 354 obs. of 5 variables:
$ date : chr "2001-12-03" "2001-12-10" "2001-12-19" "2001-12-26" ...
$ 2003_NDVI_1: num 0.441 0.518 0.322 0.311 0.499 0.319 0.163 0.134 0.452 0.536 ...
$ 2003_NDVI_2: num 0.377 0.446 0.075 0.1 0.006 0.279 0.368 0.135 0.423 0.522 ...
$ 2003_NDVI_3: num 0.332 0.397 0.07 0.093 0.006 0.236 0.469 0.127 0.411 0.535 ...
$ 2003_NDVI_4: num 0.653 0.621 0.536 0.064 0.652 0.576 0.52 0.158 0.666 0.663 ...
The 3rd function is simply getting together all these files:
> head(full_list,20)
[1] "x2003_meanNDVI_m1" "x2003_meanNDVI_m2" "x2003_meanNDVI_m3" "x2003_meanNDVI_m4" "x2003_meanNDVI_m5"
[6] "x2003_meanNDVI_m6" "x2003_meanPixelQa_m1" "x2003_meanPixelQa_m2" "x2003_meanPixelQa_m3" "x2003_meanPixelQa_m4"
[11] "x2003_meanPixelQa_m5" "x2003_meanPixelQa_m6"
So far, very simple. Now it comes to the problem... I want to join all these files by the column 'date'. This very same procedure is working on other scripts I built:
data2003 <- reduce(full_list, full_join, by="date")
But I keep getting an error:
> data2003 <- reduce(full_list, full_join, by="date")
Error in UseMethod("full_join") :
no applicable method for 'full_join' applied to an object of class "character"
So far, what I have tried:
Changing the column type from character, to date, to number... Nothing.
Altering the order of dplyr and plyr packages when opening R.
Changing variable names and so on.
full_lst <- list(NDVI2003,PixelQa2003) instead of full_list <- do.call(c, list(NDVI2003,PixelQa2003))
-Adding full_list <- mget(full_list)
Google for hours lookin for an answer...
Any help will be really welcome.

Related

How to make purrr invoke_map work with closures

In order to create a function to deal with moving averages, I bumped into this problem. Using dplyr and purrr, I tried to generate a list of closures.
v <- 5
funs <- map(1:v, ~ . %>% lag(n = .x) )
It occurs that, although funs[[1]](rnorm(100)) or funs[[2]](rnorm(100)) work, I didn't manage to make this line work:
invoke_map(funs, rnorm(100))
Why does this happen?
invoke_map isn't sure how you want it to iterate. It's a very flexible function, which sometimes iterates across the functions, sometimes across the parameters, and sometimes across both. To make it explicit that you only want it to iterate across the functions, specify to which parameter rnorm(x) should go, though doing so is easier if you keep a traditional function structure instead of a functional sequence:
library(purrr)
set.seed(47)
funs <- map(1:5, ~partial(dplyr::lag, n = .x))
funs %>%
invoke_map(x = rnorm(10)) %>%
str(vec.len = 10)
#> List of 5
#> $ : num [1:10] NA 1.9947 0.7111 0.1854 -0.2818 0.1088 -1.0857 -0.9855 0.0151 -0.252
#> $ : num [1:10] NA NA 1.9947 0.7111 0.1854 -0.2818 0.1088 -1.0857 -0.9855 0.0151
#> $ : num [1:10] NA NA NA 1.995 0.711 0.185 -0.282 0.109 -1.086 -0.985
#> $ : num [1:10] NA NA NA NA 1.995 0.711 0.185 -0.282 0.109 -1.086
#> $ : num [1:10] NA NA NA NA NA 1.995 0.711 0.185 -0.282 0.109

writing small dataframe to csv creates a huge file

I am trying to write the dataframe T_df into a csv file, however the saved "TFile.csv" file grows to approx 50GB on the Microsoft Azure / R server. Has someone experienced something similar and can please advise?
Example:
write.csv(T_df,"TFile.csv")
creates 50GB file, while dataframe is not that big
object.size(T_df)
2449776 bytes
str(T_df)
'data.frame': 101994 obs. of 3 variables:
Don't know if there's something special about your particular data but I don't see this when I run Microsoft R Server version 9.3.0.
> T_df <- data.frame(a = runif(101994), b = runif(101994), c = runif(101994))
> object.size(T_df)
2448752 bytes
> str(T_df)
'data.frame': 101994 obs. of 3 variables:
$ a: num 0.248 0.504 0.197 0.634 0.407 ...
$ b: num 0.226 0.686 0.556 0.629 0.412 ...
$ c: num 0.959 0.122 0.214 0.666 0.23 ...
>
> write.csv(T_df,"TFile.csv")
TFile.csv is 6.1 M

Accessing dataframes after splitting a dataframe

I'm splitting a dataframe in multiple dataframes using the command
data <- apply(data, 2, function(x) data.frame(sort(x, decreasing=F)))
I don't know how to access them, I know I can access them using df$1 but I have to do that for every dataframe,
df1<- head(data$`1`,k)
df2<- head(data$`2`,k)
can I get these dataframes in one go (like storing them in some form) however the indexes of these multiple dataframes shouldn't change.
str(data) gives
List of 2
$ 7:'data.frame': 7 obs. of 1 variable:
..$ sort.x..decreasing...F.: num [1:7] 0.265 0.332 0.458 0.51 0.52 ...
$ 8:'data.frame': 7 obs. of 1 variable:
..$ sort.x..decreasing...F.: num [1:7] 0.173 0.224 0.412 0.424 0.5 ...
str(data[1:2])
List of 2
$ 7:'data.frame': 7 obs. of 1 variable:
..$ sort.x..decreasing...F.: num [1:7] 0.265 0.332 0.458 0.51 0.52 ...
$ 8:'data.frame': 7 obs. of 1 variable:
..$ sort.x..decreasing...F.: num [1:7] 0.173 0.224 0.412 0.424 0.5 ...
Thanks to #r2evans I got it done, here is his code from the comments
Yes. Two short demos: lapply(data, head, n=2), or more generically
sapply(data, function(df) mean(df$x)). – r2evans
and after that fetching the indexes
df<-lapply(df, rownames)

apply create columns function to a list r

I am new in using apply and functions together and I am stuck and frustrated. I have 2 different list of data frames that I need to add certain number of columns to the first one when a condition is fulfill related to the second one. Below this is the structure of the first list that has one data frame for any station and every df has 2 or more columns with each pressure:
> str(KDzlambdaEG)
List of 3
$ 176:'data.frame': 301 obs. of 3 variables:
..$ 0 : num [1:301] 0.186 0.182 0.18 0.181 0.177 ...
..$ 5 : num [1:301] 0.127 0.127 0.127 0.127 0.127 ...
..$ 20: num [1:301] 0.245 0.241 0.239 0.236 0.236 ...
$ 177:'data.frame': 301 obs. of 2 variables:
..$ 0 : num [1:301] 0.132 0.132 0.132 0.13 0.13 ...
..$ 25: num [1:301] 0.09 0.092 0.0902 0.0896 0.0896 ...
$ 199:'data.frame': 301 obs. of 2 variables:
..$ 0 : num [1:301] 0.181 0.182 0.181 0.182 0.179 ...
..$ 10: num [1:301] 0.186 0.186 0.185 0.183 0.184 ...
On the other hand I have the second list that have the number of columns that I need to add after every column on each data frame of the first list :
> str(dif)
List of 3
[[176]]
[1] 4 15 28
[[177]]
[1] 24 67
[[199]]
[1] 9 53
I´ve tried tonnes of things even this, using the append_col function that appear in:
How to add a new column between other dataframe columns?
for (i in 1:length(dif)){
A<-lapply(KDzlambdaEG,append_col,rep(list(NA),dif[[i]][1]),after=1)
}
but nothing seems to work so far... I have searched for answers here but its difficult to find specific ones being a newcomer.
Try:
indxlst <- lapply(dif, function(x) c(1, x[-length(x)]+1, x[length(x)]))
newdflist <- lapply(indxlst, function(x) data.frame(matrix(0, 2, sum(x))))
for(i in 1:length(newdflist)) {
newdflist[[i]][indxlst[[i]]] <- KDzlambdaEG[[i]]
}
Reproducible Data Test
df1 <- data.frame(x=1:2, y=c("Jan", "Feb"), z=c("A", "B"))
df3 <- df2 <- df1[,-3]
KDzlambdaEG <- list(df1,df2,df3)
x1 <- c(4,15,28)
x2 <- c(24,67)
x3 <- c(9, 53)
dif <- list(x1,x2,x3)
indxlst <- lapply(dif, function(x) c(1, x[-length(x)]+1, x[length(x)]))
newdflist <- lapply(indxlst, function(x) data.frame(matrix(0, 2, sum(x))))
for(i in 1:length(newdflist)) {
newdflist[[i]][indxlst[[i]]] <- KDzlambdaEG[[i]]
}
newdflist

Mysterious source of output in R?

I am using following code using mtcars data and factanal function for factor analysis. The print of fit$loadings give the proportional variance but it does not seem to be there in str(fit$loadings) :
> fit <- factanal(mtcars, 3, rotation="varimax")
> fit$loadings
Loadings:
Factor1 Factor2 Factor3
mpg 0.643 -0.478 -0.473
cyl -0.618 0.703 0.261
disp -0.719 0.537 0.323
hp -0.291 0.725 0.513
drat 0.804 -0.241
wt -0.778 0.248 0.524
qsec -0.177 -0.946 -0.151
vs 0.295 -0.805 -0.204
am 0.880
gear 0.908 0.224
carb 0.114 0.559 0.719
Factor1 Factor2 Factor3
SS loadings 4.380 3.520 1.578
Proportion Var 0.398 0.320 0.143 <<<<<<<<<<<<< I NEED THESE NUMBERS AS A VECTOR
Cumulative Var 0.398 0.718 0.862
>
> str(fit$loadings)
loadings [1:11, 1:3] 0.643 -0.618 -0.719 -0.291 0.804 ...
- attr(*, "dimnames")=List of 2
..$ : chr [1:11] "mpg" "cyl" "disp" "hp" ...
..$ : chr [1:3] "Factor1" "Factor2" "Factor3"
How can I get Proportional variance vector from fit$loadings? Thanks for your help.
Let obj <- fit$loadings. Here is a complete path how to obtain the result.
By writing fit$loadings (or obj) we actually call print(obj). So, after looking at str, you might want to check what does the specific print method do with obj. To know what method we should look for, we check class(obj) and get "loadings".
Then, writing print.loadings does not give anything because the function is hidden. Therefore, since function factanal is in the package stats, we call stats:::print.loadings and get a complete source code of the function. By inspecting it, we see that we can get the desired result as follows.
colSums(obj^2) / nrow(obj)
# Factor1 Factor2 Factor3
# 0.3982190 0.3199652 0.1434125

Resources