dplyr Mutate Creating Matrix Instead of Vector - r

I am creating a new column that looks at conditions in my data frame and alerts me whether an issue needs to be investigated or monitored. The code to add the column looks like this:
library(dplyr)
df %>%
mutate("Status" =
ifelse(apply(.[2:7], 1, sum) > 0 & .[8] > 0, "Investigate",
"Monitor"
)
)
If I run the command class(df$Status) on this newly generated column the class is listed as 'matrix'. What? Why isn't it listed as 'character'.
If I look at the structure of my data frame there's some oddity that may be the key, but I don't understand why. Notice that the first columns listed simply look like intergers, then the third column listed, which is the same data, has all this 'attr' phrasing. What is going on?
$ 2017-08 : int NA 1 NA 1 1 2 NA NA NA NA ...
$ 2017-09 : int NA NA 1 NA NA NA NA NA NA NA ...
$ 2017-10 : int NA NA NA NA NA NA 1 NA NA NA ...
- attr(*, "vars")= chr "Material"
- attr(*, "drop")= logi TRUE
- attr(*, "indices")=List of 34
..$ : int 0
..$ : int 1
..$ : int 2
..$ : int 3
..$ : int 4
...continued...
- attr(*, "group_sizes")= int 1 1 1 1 1 1 1 1 1 1 ...
- attr(*, "biggest_group_size")= int 1
- attr(*, "labels")='data.frame': 34 obs. of 1 variable:
I grouped variables earlier and sometimes ungrouping magically helps. In addition I often have to convert tibbles back to data frames to get other routines to work in my code. This may or may not be related.

Related

Rayshader: Rendered polygons don't align with the surface height

this is my first post and i will try to describe my problem as exactly as i can without writing a novel. Also since english is not my native language please forgive any ambiguities or spelling errors.
I am currently trying out the rayshader package for R in order to visualise several layers and create a representation of georeferenced data from Berlin. The data i got is a DEM (5m resolution) and a GEOJSON including a building layer including information of the building heights, a water layer and a tree layer including tree heights.
For now only the DEM and the building layer are used.
I can render the DEM without any problems. The buildingpolygons are also getting extruded and rendered, but their foundation height does not coincide with the corresponding height that should be read from the elevation matrix created from the DEM.
I expected the polygons to be placed correctly and "stand" on the rendered surface, but most of them clip through said surface or are stuck inside the ground layer. My assumption is, that i use a wrong function for my purpose - the creator of the package uses render_multipolygonz() for buildings as can be seen here timecode 12:49. I tried that, but it just renders an unextruded continuous polygon on my base layer underneath the ground.
Or that i am missing an Argument of the render_polygons() function.
It could also be quite possible, that i am producing a superficial calling or assignment error, since i am all but an expert in R. I am just starting my coding journey.
Here is my code:
#set wd to save location
setwd(dirname(rstudioapi::getActiveDocumentContext()$path))
#load libs
library(geojsonR)
library(rayshader)
library(raster)
library(sf)
library(rgdal)
library(dplyr)
library(rgl)
#load DEM
tempel_DOM <- raster("Daten/Tempelhof_Gelaende_5m_25833.tif")
#load buildings layer from GEOJSON
buildings_temp <-
st_read(dsn = "Daten/Tempelhof_GeoJSON_25833.geojson", layer = "polygon") %>%
st_transform(crs = st_crs(tempel_DOM)) %>%
filter(!is.na(bh))
#create elevation matrix from DEM
tempel_elmat <- raster_to_matrix(tempel_DOM)
#Tempelhof Render
tempel_elmat %>%
sphere_shade(texture = "imhof1") %>%
add_shadow(ray_shade(tempel_elmat), 0.5) %>%
plot_3d(
tempel_elmat,
zscale = 5,
fov = 0,
theta = 135,
zoom = 0.75,
phi = 45,
windowsize = c(1000, 800),
)
render_polygons(
buildings_temp,
extent = extent(tempel_DOM),
color = 'hotpink4',
parallel = TRUE,
data_column_top = 'bh',
clear_previous = T,
)
The structure of my buildings_temp using str() is:
> str(buildings_temp)
Classes ‘sf’ and 'data.frame': 625 obs. of 11 variables:
$ t : int 1 1 1 1 1 1 1 1 1 1 ...
$ t2 : int NA NA NA NA NA NA NA NA NA NA ...
$ t3 : int NA NA NA NA NA NA NA NA NA NA ...
$ t4 : int NA NA NA NA NA NA NA NA NA NA ...
$ t1 : int 1 4 1 1 1 1 1 1 1 1 ...
$ bh : num 20.9 2.7 20.5 20.1 19.3 20.9 19.7 19.8 19.6 17.8 ...
$ t5 : int NA NA NA NA NA NA NA NA NA NA ...
$ t6 : int NA NA NA NA NA NA NA NA NA NA ...
$ th : num NA NA NA NA NA NA NA NA NA NA ...
$ id : int 261 262 263 264 265 266 267 268 269 270 ...
$ geometry:sfc_MULTIPOLYGON of length 625; first list element: List of 1
..$ :List of 1
.. ..$ : num [1:12, 1:2] 393189 393191 393188 393182 393177 ...
..- attr(*, "class")= chr [1:3] "XY" "MULTIPOLYGON" "sfg"
- attr(*, "sf_column")= chr "geometry"
- attr(*, "agr")= Factor w/ 3 levels "constant","aggregate",..: NA NA NA NA NA NA NA NA NA NA
..- attr(*, "names")= chr [1:10] "t" "t2" "t3" "t4" ...
Thanks in advance for any help.
Cheers WiTell

How to code Simple returns for multiple columns?

How do I code this formula:
Simple returns = [(Pt / Pt-1) - 1]
I have tried the below, but keep getting the wrong numbers.
stockindices = read.csv('https://raw.githubusercontent.com/bandcar/Examples/main/stockInd.csv')
library(tidyverse)
simple_returns <- stockindices %>%
mutate(across(3:ncol(.), ~ ((.x / lag(.x-1))-1)))
You had too many -1's in your expression:
simple_returns <- stockindices %>%
mutate(across( 3:ncol(.), ~ .x / lag(.x)-1))
str(simple_returns)
'data.frame': 3978 obs. of 8 variables:
$ X : int 1 2 3 4 5 6 7 8 9 10 ...
$ Date: chr "1999-04-01" "1999-05-01" "1999-06-01" "1999-07-01" ...
$ DJX : num NA 0.01382 0.025107 -0.000755 0.011068 ...
$ SPX : num NA 0.01358 0.02214 -0.00205 0.00422 ...
$ HKX : num NA 0.00835 0.03465 0.04493 0.00272 ...
$ NKX : num NA -0.01365 0.01781 0.00506 -0.01069 ...
$ DAX : num NA 0.000295 0.036108 -0.022119 0.01308 ...
$ UKX : num NA 0.0134 0.03199 -0.00774 0.00754 ...
You could have bracketed the .x/lag(.x) but it's not necessary here because of operator precedence and R's order of operations rules. The default lag-interval is 1 so it doesn't need to be inside the argument to lag. If you had wanted the semi-monthly returns it would have been
~ .x/lag(.x, 2) - 1
And as always it will pay to make sure that you have masked the stats::lag function, which is quite different and doesn't play nicely with the tidyverse.

Subsetting SPSS data imported into r with package haven?

I've used the package haven to read SPSS data into R. All seems ok, except that when I try to subset the data it doesn't seem to behave correctly. Here's the code (I don't have SPSS to create example data and can't post the real stuff):
require(haven)
df <- read_spss("filename1.sav")
tmp <- df[as_factor(df$variable1) == "factor1",]
tmp <- tmp[!is.na(tmp$variable2), ]
The above df has "NA" scattered throughout. I expected the above to subset only the data, keeping only rows with variable1 with "factor1" and discarding all rows with NAs in variable2. The first subset works as expected. But the second subset does not. It removes rows, but NAs are still present.
I suspect the issue has something to do with the way haven structures the imported data and uses the class labelled instead of an actual factor variable, but it's over my head. Anyone know what could be happening and how to accomplish the same?
Here's the structure of df, variable1 and variable2:
> str(df)
'data.frame': 4573 obs. of 316 variables:
> str(df$variable1)
Class 'labelled' atomic [1:4573] 9 9 9 14 8 8 2 4 8 16 ...
..- attr(*, "labels")= Named num [1:18] 1 2 3 4 5 6 7 8 9 10 ...
.. ..- attr(*, "names")= chr [1:18] "factor1" "factor2" "factor3" "factor4" ...
> str(df$variable2)
Class 'labelled' atomic [1:4573] 3 NA 3 NA 3 NA 1 1 NA NA ...
..- attr(*, "labels")= Named num [1:3] 1 2 3
.. ..- attr(*, "names")= chr [1:3] "Sponsor" "Not a Sponsor" "Don't Know"

Aggregate - na.omit and na.pass in R with factor (group by factor)?

I have a data set containing salaries test data. Not all cells have values hence I used na.action=na.pass,na.rm=TRUE but it gives me an error due to the fact that I want to aggregate with JobTitle which is factor?
So far I have developed below code:
aggregate(salaries$JobTitle,
list(pay = salaries$TotalPay),
FUN=mean,
na.action=na.pass,
na.rm=TRUE)
My test data has the following columns:
'data.frame': 104 obs. of 36 variables:
$ Id : int 1 2 3 4 5 6 7 8 9 10 ...
$ EmployeeName : Factor w/ 11 levels "","ALBERT PARDINI",..: 10 7 2 4 11 6 3 5 9 8 ...
$ JobTitle : Factor w/ 9 levels "","ASSISTANT DEPUTY CHIEF II",..: 8 4 4 9 6 2 3 7 3 5 ...
$ BasePay : num 167411 155966 212739 77916 134402 ...
$ OvertimePay : num 0 245132 106088 56121 9737 ...
$ OtherPay : num 400184 137811 16453 198307 182235 ...
$ Benefits : logi NA NA NA NA NA NA ...
$ TotalPay : num 567595 538909 335280 332344 326373 ...
$ TotalPayBenefits: num 567595 538909 335280 332344 326373 ...
$ Year : int 2011 2011 2011 2011 2011 2011 2011 2011 2011 2011 ...
$ Notes : logi NA NA NA NA NA NA ...
$ Agency : Factor w/ 2 levels "","San Francisco": 2 2 2 2 2 2 2 2 2 2 ..
The error code which comes up is
Warning messages:
1: In mean.default(X[[i]], ...) :
argument is not numeric or logical: returning NA
2: In mean.default(X[[i]], ...) :
argument is not numeric or logical: returning NA
etc...
I have tried with salaries$Id and it work like magic so I assume the code is correct and perhaps I need to change the data type for JobTitle?
If we are getting the mean of 'TotalPaygrouped by 'JobTitle', theformula` method would be
aggregate(TotalPay~JobTitle, salaries, mean, na.rm=TRUE, na.action=na.pass)
Or use
aggregate(salaries$TotalPay, list(salaries$JobTitle), FUN=mean, na.rm=TRUE)
data
set.seed(24)
salaries <- data.frame(JobTitle = sample(LETTERS[1:5], 20,
replace=TRUE), TotalPay= sample(c(1:20, NA), 20))

Why does mutate change the variable type?

activity <- mutate(
activity, steps = ifelse(is.na(steps), lookup_mean(interval), steps))
The "steps" variable changes from an int to a list. I want it to stay an "int" so I can aggregate it (aggregate is failing because it is a list type).
Before:
> str(activity)
'data.frame': 17568 obs. of 3 variables:
$ steps : int NA NA NA NA NA NA NA NA NA NA ...
$ date : Factor w/ 61 levels "2012-10-01","2012-10-02",..: 1 1 1 1 1 1 1 1 1 1 ...
$ interval: int 0 5 10 15 20 25 30 35 40 45 ...
After:
> str(activity)
'data.frame': 17568 obs. of 3 variables:
$ steps :List of 17568
..$ : num 1.72
..$ : num 1.72
Lookup mean is defined here:
lookup_mean <- function(i) {
return filter(daily_activity_pattern, interval == 0) %>% select(steps)
}
The problem is that lookup_mean returns a list, so R casts each value in activity$steps to a list. lookup_mean should be:
lookup_mean <- function(i) {
interval <- filter(daily_activity_pattern, interval == 0) %>% select(steps)
return(interval$steps)
}

Resources