I have managed to subset and lapply a list of data frames as follows:
subsetDeathHA <-
na.omit(subset(outcome,select = c("Hospital Name", "mortailityRate", "State"), ))
orderSubsetDeathHA <-
subsetDeathHA[order(subsetDeathHA$"mr" , subsetDeathHA$"Hospital Name", subsetDeathHA$'State' ),]
splitOrderSubsetDeahtHA <-
split(orderSubsetDeathHA, orderSubsetDeathHA$'State')
aa<- lapply(splitOrderSubsetDeahtHA, function(x) { x[num,] })
num is the ranking number on a per State basis.
Using str(aa) shows this object is a list of (54) data.frames, where each data.frame is one object of 3 variables as follows:
List of 54
$ AK:'data.frame': 1 obs. of 3 variables:
..$ Hospital Name : chr NA
..$ mortalityRate : num NA
..$ State : chr NA
..- attr(*, "na.action")=Class 'omit' Named int [1:1986] 4 5 6 10 13 17 19 23 27 28 ...
.. .. ..- attr(*, "names")= chr [1:1986] "4" "5" "6" "10" ...
$ AL:'data.frame': 1 obs. of 3 variables:
..$ Hospital Name : chr "D C H REGIONAL MEDICAL CENTER"
..$ mortalityRate : num 15.8
..$ State : chr "AL"
..- attr(*, "na.action")=Class 'omit' Named int [1:1986] 4 5 6 10 13 17 19 23 27 28 ...
.. .. ..- attr(*, "names")= chr [1:1986] "4" "5" "6" "10" ...
What I can't seem to do is the following
1) Subset out the Hospital Name and the State by removing the mortalityRate variable and return a list of the resulting 54 objects/data frames.
2) Place row.names =F appropriately to suppress the indexing that R provides.
3) Even though I thought I had 'na'd out' the NA values in the first sub-setting operation,
when I print(aa), what follows is a sample of the output.
$AK
Hospital Name mr State
NA NA <NA> NA <NA>
$AL
Hospital Name mr State
56 D C H REGIONAL MEDICAL CENTER 15.8 AL
etc...
Any help/suggestions appreciated
Related
My data has 1,000 entries and here is the str of the first 2 elements:
> str(my_boots[1:2])
List of 2
$ :List of 4
..$ result : Named num [1:10] 0.118 0.948 4.317 1.226 1.028 ...
.. ..- attr(*, "names")= chr [1:10] "(Intercept)" "pvi2" "freqchal" "sexexp" ...
..$ output : chr "list()"
..$ warnings: chr(0)
..$ messages: chr(0)
$ :List of 4
..$ result : Named num [1:10] 0.202 0.995 2.512 1.057 0.5 ...
.. ..- attr(*, "names")= chr [1:10] "(Intercept)" "pvi2" "freqchal" "sexexp" ...
..$ output : chr "list()"
..$ warnings: chr(0)
..$ messages: chr(0)
The fields of interest are $result and $warnings; I want to return a tibble with the columns based on the names within the named list result where warning == "" (where no warning).
I'm new to purrr but I was able to get most of the way there using map_dfr(my_boots[1:2],"result") - this returns a tibble with the column names from the named numbers list but I would like to only return the ones where the entry under warnings is blank.
I wasn't sure how to create this structure manually but was able to create a single element of my_boots:
test <- list(
list("warnings" = c("blah")),
list("result" = c("alpha" = 1.1, "beta" = 2.1, "theta" =3.1, "blah" = 4.1))
)
Also: I'm using the tidyverse - thank you.
Starting with some dummy data.
library(tidyverse)
l <- list(
list(
result = 1:10,
warnings = character(0)
),
list(
result = 2:20,
warnings = "warn"
),
list(
result = 3:30,
warnings = character(0)
),
list(
result = 4:40,
warnings = "warn"
)
)
Use keep to keep only elements without warnings. map("result") pulls the result element out of each list.
l %>%
keep(~is_empty(.$warnings)) %>%
map("result")
#> [[1]]
#> [1] 1 2 3 4 5 6 7 8 9 10
#>
#> [[2]]
#> [1] 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
#> [22] 24 25 26 27 28 29 30
I am working on an economical research and have a data frame filled with regression coefficients using melt & tidy functions from broom package. My df:
> head(LmModGDP, 10)
Country variable term estimate std.error statistic p.value
1 Netherlands FDI_InFlow_MilUSD (Intercept) 5.354083e+02 5.974760e+01 8.961167 1.976417e-09
2 Netherlands FDI_InFlow_MilUSD value 2.400677e-03 1.409779e-03 1.702875 1.005189e-01
3 Netherlands FDI_InFlow_percGDP (Intercept) 6.184273e+02 6.723554e+01 9.197923 1.173719e-09
4 Netherlands FDI_InFlow_percGDP value -1.261933e+00 1.008740e+01 -0.125100 9.014067e-01
5 Netherlands FDI_InStock_MilUSD (Intercept) 3.110956e+02 2.719577e+01 11.439116 1.201802e-11
6 Netherlands FDI_InStock_MilUSD value 7.025298e-04 5.307147e-05 13.237429 4.620706e-13
7 Netherlands FDI_OutFlow_MilUSD (Intercept) 5.106762e+02 5.939921e+01 8.597356 4.465840e-09
8 Netherlands FDI_OutFlow_MilUSD value 1.920313e-03 8.646908e-04 2.220808 3.528536e-02
9 Netherlands FDI_OutFlow_percGDP (Intercept) 2.593453e+02 5.334202e+01 4.861932 4.838082e-05
10 Netherlands FDI_OutFlow_percGDP value 3.931491e+00 5.332541e-01 7.372641 7.896681e-08
After I filter the df using any method (even simply by subseting or with dplyr package):
LmModGDP[LmModGDP$variable == "FDI_InStock_MilUSD",]
or
LmModGDP %>%
filter(variable == "FDI_InStock_MilUSD")
It returns the desired df but when I drag my mouse over the last column (p.value) in RStudio viewer it tells me that it is "Unknown Column" and the data still correct. Also when I use str or class function on it it shows that it is numeric but in the viewer it shows something else..
My desired df:
Country variable term estimate std.error statistic p.value
5 Netherlands FDI_InStock_MilUSD (Intercept) 3.110956e+02 2.719577e+01 11.439116 1.201802e-11
6 Netherlands FDI_InStock_MilUSD value 7.025298e-04 5.307147e-05 13.237429 4.620706e-13
19 Romania FDI_InStock_MilUSD (Intercept) 3.122229e+01 3.313134e+00 9.423796 7.188216e-10
20 Romania FDI_InStock_MilUSD value 2.128223e-03 7.035679e-05 30.249006 8.588104e-22
When I try to use kable function to display it in markdown report p.value column shows only 0 values... not the actual ones.
Can someone help me ?
!! UP !!
Here's an output of str :
Classes ‘grouped_df’, ‘tbl_df’, ‘tbl’ and 'data.frame': 28 obs. of 7 variables:
$ Country : chr "Netherlands" "Netherlands" "Netherlands" "Netherlands" ...
$ variable : Factor w/ 7 levels "FDI_InFlow_MilUSD",..: 1 1 2 2 3 3 4 4 5 5 ...
$ term : chr "(Intercept)" "value" "(Intercept)" "value" ...
$ estimate : num 535.4083 0.0024 618.4273 -1.2619 311.0956 ...
$ std.error: num 59.7476 0.00141 67.23554 10.0874 27.19577 ...
$ statistic: num 8.961 1.703 9.198 -0.125 11.439 ...
$ p.value : num 1.98e-09 1.01e-01 1.17e-09 9.01e-01 1.20e-11 ...
- attr(*, "vars")= chr "Country" "variable"
- attr(*, "drop")= logi TRUE
- attr(*, "indices")=List of 14
..$ : int 0 1
..$ : int 2 3
..$ : int 4 5
..$ : int 6 7
..$ : int 8 9
..$ : int 10 11
..$ : int 12 13
..$ : int 14 15
..$ : int 16 17
..$ : int 18 19
..$ : int 20 21
..$ : int 22 23
..$ : int 24 25
..$ : int 26 27
- attr(*, "group_sizes")= int 2 2 2 2 2 2 2 2 2 2 ...
- attr(*, "biggest_group_size")= int 2
- attr(*, "labels")='data.frame': 14 obs. of 2 variables:
..$ Country : chr "Netherlands" "Netherlands" "Netherlands" "Netherlands" ...
..$ variable: Factor w/ 7 levels "FDI_InFlow_MilUSD",..: 1 2 3 4 5 6 7 1 2 3 ...
..- attr(*, "vars")= chr "Country" "variable"
..- attr(*, "drop")= logi TRUE
I cannot comment yet, this is why I write here an answer.
Could you show us the output of str(LmModGDP) ? Maybe the df is nested? Maybe it is not a pure df but has special properties. Have you tried forcing LmModGDP<-as.data.frame(LmModGDP) ?
Have you tried forcing LmModGDP$p.value<-as.numeric(LmModGDP$p.value) ?
Have you tried converting to data.table and see if the behavior is different after applying your filter on it?
UPDATE1:
Thanks for posting the str(). Your object is a "grouped_df". Have you tried ungroup(LmModGDP)?
How can i change a variable of class 'labelled' into a character variable only showing the labels as string or character? So i only want to see the last attribute, see below for the structure of my variable.
Class 'labelled' atomic [1:918] 4 12 13 20 26 36 40 1 4 13 ...
..- attr(*, "format.spss")= chr "F8.0"
..- attr(*, "labels")= Named num [1:40] 1 2 3 4 5 6 7 8 9 10 ...
.. ..- attr(*, "names")= chr [1:40] "People management" "HR" "Self management" "Email" ...
No designated function but converting to factor then again to character works:
s2 <- labelled(c(1, 1, 2), c(Male = 1, Female = 2)) # example from haven:labelled
as.character(as_factor(s2)) # use haven::as_factor
#shiro has an excellent solution for this. Expounding a bit.
Here's a general solution for other problems like it:
library(haven)
s2 <- labelled(c(1, 1, 2), c(Male = 1, Female = 2)) # example from haven:labelled
# use the default str method to see what s2 is 'inside'
utils:::str.default(s2)
# 'haven_labelled' num [1:3] 1 1 2
# - attr(*, "labels")= Named num [1:2] 1 2
# ..- attr(*, "names")= chr [1:2] "Male" "Female"
# It's a numeric vector with an attribute, labels
# labels is itself a numeric vector with the attribute names
# names is a character vector
# So take the attribute's attribute and subset with the numerics
attr(attr(s2,'labels'),'names')[s2]
I would like to make some specific calculation within a large dataset.
This is my MWE using an API call (takes 3-4 sec ONLY to Download)
devtools::install_github('mingjerli/IMFData')
library(IMFData)
fdi_asst <- c("BFDA_BP6_USD","BFDAD_BP6_USD","BFDAE_BP6_USD")
databaseID <- "BOP"
startdate <- "1980-01-01"
enddate <- "2016-12-31"
checkquery <- FALSE
FDI_ASSETS <- as.data.frame(CompactDataMethod(databaseID, list(CL_FREA = "Q", CL_AREA_BOP = "", CL_INDICATOR_BOP= fdi_asst), startdate, enddate, checkquery))
my dataframe 'FDI_ASSETS' looks like this (I provide a picture instead of head() for convenience)
the last column is a list and contains three more variables:
head(FDI_ASSETS$Obs)
[[1]]
#TIME_PERIOD #OBS_VALUE #OBS_STATUS
1 1980-Q1 30.0318922812441 <NA>
2 1980-Q2 23.8926174547104 <NA>
3 1980-Q3 26.599634375058 <NA>
4 1980-Q4 32.7522451203517 <NA>
5 1981-Q1 44.124979234001 <NA>
6 1981-Q2 35.9907120805994 <NA>
MY SCOPE
I want to do the following:
if/when the "#UNIT_MULT == 6" then divide the "#OBS_VALUE" in FDI_ASSETS$Obs by 1000
if/when the "#UNIT_MULT == 3" then divide the "#OBS_VALUE" in FDI_ASSETS$Obs by 1000000
UPDATE
Structure of FDI_ASSETS looks like this:
str(FDI_ASSETS)
'data.frame': 375 obs. of 6 variables:
$ #FREQ : chr "Q" "Q" "Q" "Q" ...
$ #REF_AREA : chr "FI" "MX" "MX" "TO" ...
$ #INDICATOR : chr "BFDAE_BP6_USD" "BFDAD_BP6_USD" "BFDAE_BP6_USD" "BFDAD_BP6_USD" ...
$ #UNIT_MULT : chr "6" "6" "6" "3" ...
$ #TIME_FORMAT: chr "P3M" "P3M" "P3M" "P3M" ...
$ Obs :List of 375
..$ :'data.frame': 147 obs. of 3 variables:
.. ..$ #TIME_PERIOD: chr "1980-Q1" "1980-Q2" "1980-Q3" "1980-Q4" ...
.. ..$ #OBS_VALUE : chr "30.0318922812441" "23.8926174547104" "26.599634375058" "32.7522451203517" ...
.. ..$ #OBS_STATUS : chr NA NA NA NA ...
..$ :'data.frame': 60 obs. of 2 variables:
.. ..$ #TIME_PERIOD: chr "2001-Q1" "2001-Q3" "2002-Q1" "2002-Q2" ...
.. ..$ #OBS_VALUE : chr "9.99999999748979E-05" "9.99999997475243E-05" "9.8999999998739E-05" "-9.90000000342661E-05" ...
..$ :'data.frame': 63 obs. of 2 variables:
.. ..$ #TIME_PERIOD: chr "2001-Q1" "2001-Q2" "2001-Q3" "2001-Q4" ...
.. ..$ #OBS_VALUE : chr "130.0149" "189.627" "3453.8319" "630.483" ...
..$ :'data.frame': 17 obs. of 2 variables:
I downloaded your data and it is quite complicated. I have removed my wrong answer so that you can get it answered by #akrun or someone similar :) I don't have the time to parse through it right now.
I found the following solution
list_assets<-list(FDI_ASSETS=FDI_ASSETS, Portfolio_ASSETS=Portfolio_ASSETS, other_invest_ASSETS=other_invest_ASSETS, fin_der_ASSETS=fin_der_ASSETS, Reserves=Reserves)
for (df in list_assets){
for( i in 1:length(df$"#UNIT_MULT")){
if (df$"#UNIT_MULT"[i]=="6"){
df$Obs[[i]]$"#OBS_VALUE" <- as.numeric(df$Obs[[i]]$"#OBS_VALUE")
df$Obs[[i]]$"#OBS_VALUE" <- df$Obs[[i]]$"#OBS_VALUE"/1000
} else if ((df$"#UNIT_MULT"[i]=="3")){
df$Obs[[i]]$"#OBS_VALUE" <- as.numeric(df$Obs[[i]]$"#OBS_VALUE")
df$Obs[[i]]$"#OBS_VALUE" <- df$Obs[[i]]$"#OBS_VALUE"/1000000
}
}
}
Please let me know how I can modify the code in order to make it more efficient and avoid these loops.
I do have to rename sublist titles within a main matrix list called l1. Each Name(n) is related to a value as a character string. Here is my code :
names(l1)[1] <- Name1
names(l1)[2] <- Name2
names(l1)[3] <- Name3
names(l1)[4] <- Name4
## ...
names(l1)[43] <- Name43
As you can see, I have 43 sublists. Is there a way do do that using an automated loop like for (i in 1:43) or something ? I tried to perform a loop but I am a beginner and that's very hard for now.
Edit : I would like to rename the elements of my list without having to type 43 lines manually. Here is the first three elements of my list :
str(l1)
List of 43
$ XXX : num [1:640, 1:3] -0.83 -0.925 -0.623 -0.191 0.155 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr [1:3] "EV_BICYCLE" "HW_DISTANCE" "NO_ASSETS"
$ XXX : num [1:640, 1:2] -0.159 0.485 -0.686 -0.245 -3.361 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr [1:2] "HOME_OWN" "METRO_DISTANCE"
$ XXX : num [1:640, 1:3] -0.79 1.15 0.224 0.388 -1.571 ...
..- attr(*, "dimnames")=List of 2
.. ..$ : NULL
.. ..$ : chr [1:3] "BICYCLE" "HOME_OWN_SC" "POP_SC"
That is to say, I would like to replace the 43 XXX by Name1, Name2 ... to Name43
Try
names(l1) <- unlist(mget(ls(pattern="^Nom_F")))
str(l1, list.len=2)
#List of 3
# $ Accessibility : int [1:5, 1:5] 10 10 3 9 7 6 8 2 7 8 ...
# ..- attr(*, "dimnames")=List of 2
# .. ..$ : NULL
# .. ..$ : chr [1:5] "A" "B" "C" "D" ...
# $ Access : int [1:5, 1:5] 6 4 10 5 9 8 9 4 7 1 ...
#..- attr(*, "dimnames")=List of 2
# .. ..$ : NULL
# .. ..$ : chr [1:5] "A" "B" "C" "D" ...
Instead of creating separate objects, you could create a vector of real titles. For example
v1 <- LETTERS[1:3]
names(l1) <- v1
data
set.seed(42)
l1 <- setNames(lapply(1:3, function(x)
matrix(sample(1:10, 5*5, replace=TRUE), ncol=5,
dimnames=list(NULL, LETTERS[1:5]))), rep('XXX',3))
Nom_F1 <- "Accessibility"
Nom_F2 <- "Access"
Nom_F3 <- "Poverty_and_SC"