I've created this data frame and want to access the individual elements for plotting. But it seems I can't. What kind of data frame did I have created and how can I access its individual elements?
> print(df)
B.mean B.conf1 B.conf2
1 0.75000000 -0.18826132 1.68826132
2 0.66666667 0.01334534 1.31998799
3 0.33333333 -0.31998799 0.98665466
> names(df)
[1] "B"
> struct(df)
'data.frame': 3 obs. of 1 variable:
$ B: num [1:3, 1:3] 0.75 0.6667 0.3333 -0.1883 0.0133 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr "mean" "conf1" "conf2"
The 'B' column is a matrix as evident from the str of 'df'. By using do.call with data.frame, it gets converted to 3 columns of a data.frame.
do.call(data.frame, df)
Related
I have a large list of lists. There are 46 lists in "output". Each list is a tibble with differing number of rows and columns. My immediate goal is to subset a specific column from each list.
This is str(output) of the first two lists to give you an idea of the data.
> str(output)
List of 46
$ Brain :Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 6108 obs. of 8 variables:
..$ p_val : chr [1:6108] "0" "1.60383253411205E-274" "0" "0" ...
..$ avg_diff : num [1:6108] 1.71 1.7 1.68 1.6 1.58 ...
..$ pct.1 : num [1:6108] 0.998 0.808 0.879 0.885 0.923 0.905 0.951 0.957 0.619 0.985 ...
..$ pct.2 : num [1:6108] 0.677 0.227 0.273 0.323 0.36 0.384 0.401 0.444 0.152 0.539 ...
..$ cluster : num [1:6108] 1 1 1 1 1 1 1 1 1 1 ...
..$ gene : chr [1:6108] "Plp1" "Mal" "Ermn" "Stmn4" ...
..$ X__1 : logi [1:6108] NA NA NA NA NA NA ...
..$ Cell Type: chr [1:6108] "Myelinating oligodendrocyte" NA NA NA ...
$ Bladder :Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 4656 obs. of 8 variables:
..$ p_val : num [1:4656] 0.00 1.17e-233 2.85e-276 0.00 0.00 ...
..$ avg_diff : num [1:4656] 2.41 2.23 2.04 2.01 1.98 ...
..$ pct.1 : num [1:4656] 0.833 0.612 0.855 0.987 1 0.951 0.711 0.544 0.683 0.516 ...
..$ pct.2 : num [1:4656] 0.074 0.048 0.191 0.373 0.906 0.217 0.105 0.044 0.177 0.106 ...
..$ cluster : num [1:4656] 1 1 1 1 1 1 1 1 1 1 ...
..$ gene : chr [1:4656] "Dpt" "Gas1" "Cxcl12" "Lum" ...
..$ X__1 : logi [1:4656] NA NA NA NA NA NA ...
..$ Cell Type: chr [1:4656] "Stromal cell_Dpt high" NA NA NA ...
Since I have a large number of lists that make up the list, I have been trying to create an iterative code to perform tasks. This hasn't been successful.
I can achieve this manually, or list by list, but I haven't been successful in finding an iterative way of doing this.
x <- data.frame(output$Brain, stringsAsFactors = FALSE)
tmp.list <- x$Cell.Type
tmp.output <- purrr::discard(tmp.list, is.na)
x <- subset(x, Cell.Type %in% tmp.output)
This gives me the output that I want, which are the rows in the column "Cell.Type" with non-NA values.
I got as far as the code below to get the 8th column of each list, which is the "Cell.Type" column.
lapply(output, "[", , 8))
But here I found that the naming and positioning of the "Cell.Type" column in each list is not consistent. This means I cannot use the lapply function to subset the 8th columns, as some lists have this on for example the 9th column.
I tried the code below, but it does not work and gets an error.
lapply(output, "[", , c('Cell.Type', 'celltyppe'))
#Error: Column `celltyppe` not found
#Call `rlang::last_error()` to see a backtrace
Essentially, from my "output" list, I want to subset either columns "Cell.Type" or "celltyppe" from each of the 46 lists to create a new list with 46 lists of just a single column of values. Then I want to drop all rows with NA.
I would like to perform this using some sort of loop.
At the moment I have not had much success. Lapply seems to be able to extract columns through lists iterately, and I am having difficultly trying to subset names columns.
Once I can do this, I then want to create a loop that can subset only rows without NA.
FINAL CODE
This is the final code I have used to create exactly what I had hoped for. The first line of the code specifies the loop to go through each list of the large list. The second line of code selects columns of each list that contains "ell" in its name (Cell type, Cell Type, or celltyppe). The last removes any rows with "na".
purrr::map(output, ~ .x %>%
dplyr::select(matches("ell")) %>%
na.omit)
We can use anonymous function call
lapply(output, function(x) na.omit(x[grep("(?i)Cell\\.?(?i)Typp?e", names(x))]))
#[[1]]
# Cell.Type
#1 1
#2 2
#3 3
#4 4
#5 5
#[[2]]
# celltyppe
#1 7
#2 8
#3 9
#4 10
#5 11
Also with purrr
library(tidyverse)
map(output, ~ .x %>%
select(matches("(?i)Cell\\.?(?i)Typp?e") %>%
na.omit))
data
output <- list(data.frame(Cell.Type = 1:5, col1 = 6:10, col2 = 11:15),
data.frame(coln = 1:5, celltyppe = 7:11))
I am at the learning stage of r.
I am using library(usdm) in r where I am using vifcor(vardata,th=0.4,maxobservations =50000) to find the not multicollinear variables. I need to get the result of vifcor(vardata,th=0.4,maxobservations =50000) into a structured dataframe for further analysis.
Data reading process I am using:
performdata <- read.csv('F:/DGDNDRV_FINAL/OutputTextFiles/data_blk.csv')
vardata <-performdata[,c(names(performdata[5:length(names(performdata))-2])]
Content of the csv file:
pointid grid_code Blocks_line_dst_CHT GrowthCenter_dst_CHT Roads_nationa_dst_CHT Roads_regiona_dst_CHT Settlements_CHT_line_dst_CHT Small_Hat_Bazar_dst_CHT Upazilla_lin_dst_CHT resp
1 6 150 4549.428711 15361.31836 3521.391846 318.9043884 3927.594727 480 1
2 6 127.2792206 4519.557617 15388.68457 3500.24292 342.0526123 3902.883545 480 1
3 2 161.5549469 4484.473145 15391.6377 3436.539063 335.4101868 3844.216553 540 1
My tries:
r<-vifcor(vardata,th=0.2,maxobservations =50000) returns
2 variables from the 6 input variables have collinearity problem:
Roads_regiona_dst_CHT GrowthCenter_dst_CHT
After excluding the collinear variables, the linear correlation coefficients ranges between:
min correlation ( Small_Hat_Bazar_dst_CHT ~ Roads_nationa_dst_CHT ): -0.04119076963
max correlation ( Small_Hat_Bazar_dst_CHT ~ Settlements_CHT_line_dst_CHT ): 0.1384278434
---------- VIFs of the remained variables --------
Variables VIF
1 Blocks_line_dst_CHT 1.026743892
2 Roads_nationa_dst_CHT 1.010556752
3 Settlements_CHT_line_dst_CHT 1.038307666
4 Small_Hat_Bazar_dst_CHT 1.026943711
class(r) returns
[1] "VIF"
attr(,"package")
[1] "usdm"
mode(r) returns "S4"
I need Roads_regiona_dst_CHT GrowthCenter_dst_CHT into a dataframe and VIFs of the remained variables into another dataframe!
But nothing worked!
Basically the resturned result is a S4 class and you can extract slots via the # operator:
library(usdm)
example(vifcor) # creates 'v2'
str(v2)
# Formal class 'VIF' [package "usdm"] with 4 slots
# ..# variables: chr [1:10] "Bio1" "Bio2" "Bio3" "Bio4" ...
# ..# excluded : chr [1:5] "Bio5" "Bio10" "Bio7" "Bio6" ...
# ..# corMatrix: num [1:5, 1:5] 1 0.0384 -0.3011 0.0746 0.7102 ...
# .. ..- attr(*, "dimnames")=List of 2
# .. .. ..$ : chr [1:5] "Bio1" "Bio2" "Bio3" "Bio8" ...
# .. .. ..$ : chr [1:5] "Bio1" "Bio2" "Bio3" "Bio8" ...
# ..# results :'data.frame': 5 obs. of 2 variables:
# .. ..$ Variables: Factor w/ 5 levels "Bio1","Bio2",..: 1 2 3 4 5
# .. ..$ VIF : num [1:5] 2.09 1.37 1.25 1.27 2.31
So you can extract the results and the excluded slot now via:
v2#excluded
# [1] "Bio5" "Bio10" "Bio7" "Bio6" "Bio4"
v2#results
# variables VIF
# 1 Bio1 2.086186
# 2 Bio2 1.370264
# 3 Bio3 1.253408
# 4 Bio8 1.267217
# 5 Bio9 2.309479
You should be able to use the below command to get the information in the slot 'results' into a data frame. You can then split the information out into separate data frames using traditional methods
df <- r#results
Note that r#results[1:2,2] would give you the VIF for the first two rows.
I've used the package haven to read SPSS data into R. All seems ok, except that when I try to subset the data it doesn't seem to behave correctly. Here's the code (I don't have SPSS to create example data and can't post the real stuff):
require(haven)
df <- read_spss("filename1.sav")
tmp <- df[as_factor(df$variable1) == "factor1",]
tmp <- tmp[!is.na(tmp$variable2), ]
The above df has "NA" scattered throughout. I expected the above to subset only the data, keeping only rows with variable1 with "factor1" and discarding all rows with NAs in variable2. The first subset works as expected. But the second subset does not. It removes rows, but NAs are still present.
I suspect the issue has something to do with the way haven structures the imported data and uses the class labelled instead of an actual factor variable, but it's over my head. Anyone know what could be happening and how to accomplish the same?
Here's the structure of df, variable1 and variable2:
> str(df)
'data.frame': 4573 obs. of 316 variables:
> str(df$variable1)
Class 'labelled' atomic [1:4573] 9 9 9 14 8 8 2 4 8 16 ...
..- attr(*, "labels")= Named num [1:18] 1 2 3 4 5 6 7 8 9 10 ...
.. ..- attr(*, "names")= chr [1:18] "factor1" "factor2" "factor3" "factor4" ...
> str(df$variable2)
Class 'labelled' atomic [1:4573] 3 NA 3 NA 3 NA 1 1 NA NA ...
..- attr(*, "labels")= Named num [1:3] 1 2 3
.. ..- attr(*, "names")= chr [1:3] "Sponsor" "Not a Sponsor" "Don't Know"
I tried to find the subset but it's showing error as :
I am performing Data Envelopment Analysis using Benchmarking Package in R.
Although I saw similar Question were asked before but it didn't help me .
Update :Structure and Summary of Database
I am performing DEA for V6 and V7.
I guess you need
Large.Cap$V1[e_crs$eff > 0.85]
Using a reproducible example from ?dea
library(Benchmarking)
x <- matrix(c(100,200,300,500,100,200,600),ncol=1)
y <- matrix(c(75,100,300,400,25,50,400),ncol=1)
Large.Cap <- data.frame(v1= LETTERS[1:7], v2= 1:7)
e_crs <- dea(x, y, RTS='crs', ORIENTATION='in')
e_crs
#[1] 0.7500 0.5000 1.0000 0.8000 0.2500 0.2500 0.6667
The e_crs object is a list
str(e_crs)
#List of 12
# $ eff : num [1:7] 0.75 0.5 1 0.8 0.25 ...
# $ lambda : num [1:7, 1:7] 0 0 0 0 0 0 0 0 0 0 ...
# ..- attr(*, "dimnames")=List of 2
# .. ..$ : NULL
# .. ..$ : chr [1:7] "L1" "L2" "L3" "L4" ...
# $ objval : num [1:7] 0.75 0.5 1 0.8 0.25 ...
# $ RTS : chr "crs"
# $ primal : NULL
# $ dual : NULL
# $ ux : NULL
# $ vy : NULL
# $ gamma :function (x)
# $ ORIENTATION: chr "in"
# $ TRANSPOSE : logi FALSE
# $ param : NULL
# - attr(*, "class")= chr "Farrell"
We extract the 'eff' list element from 'e_crs' to subset the 'v1' column in 'Large.Cap' dataset.
droplevels(Large.Cap$v1[e_crs$eff > 0.85])
#[1] C
#Levels: C
I've got the optim function in r returning a list of stuff like this:
[[354]]
r k sigma
389.4 354.0 354.0
but when I try accessing say list$sigma it doesn't exist returning NULL.
I've tried attach and I've tried names, and I've tried assigning it to a matrix, but none of these things would work
Anyone got any idea how I can access the lowest or highest value for sigma r or k in my list??
Many thanks!!
str gives me this output:
List of 354
$ : Named num [1:3] -55.25 2.99 119.37
..- attr(*, "names")= chr [1:3] "r" "k" "sigma"
$ : Named num [1:3] -53.91 4.21 119.71
..- attr(*, "names")= chr [1:3] "r" "k" "sigma"
$ : Named num [1:3] -41.7 14.6 119.2
So I've got a double within a list within a list (?) I'm still mystified as to how I can cycle through the list and pick one out meeting my conditions without writing a function from scratch
The key issue is that you have a list of lists (or a list of data.frames, which in fact is also a list).
To confirm this, take a look at is(list[[354]]).
The solution is simply to add an additional level of indexing. Below you have multiple alternatives of how to accomplish this.
you can use a vector as an index to [[, so for example if you want to access the third element from the 354th element, you can use
myList[[ c(354, 3) ]]
You can also use character indecies, however, all nested levels must have named indecies.
names(myList) <- as.character(1:length(myList))
myList[[ c("5", "sigma") ]]
Lastly, please try to avoid using names like list, data, df etc. This will lead to crashing code and erors which will seem unexplainable and mysterious until one realizes that they've tried to subset a function
Edit:
In response to your question in the comments above: If you want to see the structure of an object (ie the "makeup" of the object), use str
> str(myList)
List of 5
$ :'data.frame': 1 obs. of 3 variables:
..$ a : num 0.654
..$ b : num -0.0823
..$ sigma: num -31
$ :'data.frame': 1 obs. of 3 variables:
..$ a : num -0.656
..$ b : num -0.167
..$ sigma: num -49
$ :'data.frame': 1 obs. of 3 variables:
..$ a : num 0.154
..$ b : num 0.522
..$ sigma: num -89
$ :'data.frame': 1 obs. of 3 variables:
..$ a : num 0.676
..$ b : num 0.595
..$ sigma: num 145
$ :'data.frame': 1 obs. of 3 variables:
..$ a : num -0.75
..$ b : num 0.772
..$ sigma: num 6
If you want -for example- all the sigmas, you can use sapply:
sapply(list, function(x)x["sigma"])
You can use that to find the minimum and maximum:
range(sapply(list, function(x)x["sigma"]))
Using , do.call you can do this :
do.call('[',mylist,354)['sigma']