R - Type Casting With Map() - r

I would like to create a new column that extracts the hour from a timestamp as a numeric data type. If I had one data frame or tibble, I would do it as follows:
calories_hourly$activity_hour_num <- calories_hourly$activity_hour %>% mdy_hms() %>% format(format = ('%H')) %>% as.numeric()
However, I have one list of 18 tibbles called "fitbit_data" where I would like to perform the operation above for tibbles 6-16. The type casting is calculated from the second column in all of my tibbles. I have an example of the beginning of a failed attempt below:
fitbit_data[6:16] <- fitbit_data[6:16] %>% mutate(activity_hour_num=map(.x=fitbit_data[6:16], .f=~mdy(.x[2])))
Can you please help me code a tidy solution for this R task?
Thank you so much!

You can use map as -
library(purrr)
library(lubridate)
library(dplyr)
k <- 6:16
fitbit_data[k] <- map(fitbit_data[k], ~{.x[[2]] <- lubridate::mdy(.x[[2]]);.x})
Based on the first attempt you can do -
fitbit_data[k] <- map(fitbit_data[k], ~.x %>%
mutate(activity_hour = mdy_hms(activity_hour) %>%
format('%H') %>% as.numeric()))

Related

tidyjson: Replace existing column values in dataframe with output of spread_values

I have a data frame that has a column with JSON values. I found the library "tidyjson" which helps to extract this JSON. However, it is always extracted into a new data frame.
I am looking for a way to replace the JSON in the original data frame with the result of tidyjson.
Code:
mydf <- df$response %>% as.tbl_json %>% gather_array %>%
spread_values(text=jstring('text'))
Is there a way that "df$response" is replaced with the extracted json "text"-value?
Thanks in advance!
This solution worked for me:
df %>% as.tbl_json(json.column = 'response') %>% gather_array %>%
spread_values(response=jstring('text'))

dataframe in wideformat to dataframe of timeseries

I am currently struggling with reshaping my dataset to a preferred result. Lets say I have the following dataset to start with:
library(tsbox)
library(dplyr)
library(tidyr)
# create df that matches my format
df1 <- ts_wide(ts_df(ts_c(mdeaths)))
df1$id <- 1
df2 <- ts_wide(ts_df(ts_c(mdeaths)))
df2$id <- 2
df <- rbind(df1, df2)
Now this dataset has a date column, a value column and an "id" column, which should specifiy which date/value points belong to the same observation object. I would now like to reshape my dataset to a 2x2 dataframe, where the first column is the id, while the second column is a timeseries object (of the date/value corresponding to that id). To do so, I tried the following:
# create a new df, with two cols (id and ts)
df_ts <- df %>%
group_by(id) %>%
nest()
The nest command creates a "a list-column of data frames", which is not exactly what I wanted. I know that a ts can be defined via ts(data$value, data$date), but I do not know how to integrate it after the group_by(id) function. Can anyone help me how to turn this column into a ts object instead of a data frame? I am new to R and grateful for any form of help.
Thanks in advance
If you have a non-atomic data type it will have to be a list column of something.
If you want a list-column of ts object you can:
df %>%
group_by(id) %>%
summarize(ts = list(ts(value, time)))
Continuing your pipe you could:
df %>%
group_by(id) %>%
nest() %>%
mutate(data = purrr::map(data, with, ts(value, time)))

R version 3.6.3 (2020-02-29) | Using package dplyr_1.0.0 | Unable to perform summarise() function

Trying to perform the basic Summarise() function but getting the same error again and again!
I have a large number of csv files having 4 columns. I am reading them into R using lapply and rbinding them. Next I need to see the number of complete observations present for each ID.
Error:
*Problem with `summarise()` input `complete_cases`.
x unused argument (Date)
i Input `complete_cases` is `n(Date)`.
i The error occured in group 1: ID = 1.*
Code:
library(dplyr)
merged <-do.call(rbind,lapply(list.files(),read.csv))
merged <- as.data.frame(merged)
remove_na <- merged[complete.cases(merged),]
new_data <- remove_na %>% group_by(ID) %>% summarise(complete_cases = n(Date))
Here is what the data looks like
The problem is not coming from summarise but from n.
If you look at the help ?n, you will see that n is used without any argument, like this:
new_data_count <- remove_na %>% group_by(ID) %>% summarise(complete_cases = n())
This will count the number of rows for each ID group and is independent from the Date column. You could also use the shortcut function count:
new_data_count <- remove_na %>% count(ID)
If you want to count the different Date values, you might want to use n_distinct:
new_data_count_dates <- remove_na %>% group_by(ID) %>% summarise(complete_cases = n_distinct(Date))
Of note, you could have written your code with purrr::map, which has better functions than _apply as you can specify the return type with the suffix. This could look like this:
library(purrr)
remove_na = map_dfr(list.files(), read.csv) %>% na.omit()
Here, map_dfr returns a data.frame with binding rows, but you could have used map_dfc which returns a data.frame with binding columns.

transpose and filter multiple columns into single vector

I have CSVs in a clunky format (National Weather Service 'DATACARD' format - sample monthly data on page 3 here) and I'm hoping to find a better way of transposing and filtering out NAs. I think there may be something along the lines of gather() from the tidyverse, but I'm open to all approaches.
a <- c(10.5,14,16,20,23)
b <- c(11,15,17,21,24)
c <- c(12,NA,18,22,25.2)
d <- c(13,NA,19,NA,26)
rawcsv <- data.frame(a,b,c,d)
rawcsv_singlecolumn <- data.frame(singlecolumn=c(t(rawcsv)))
rawcsv_NAsremoved_thedesiredvector <- na.omit(rawcsv_singlecolumn)
desiredvector <- c(10.5,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25.2,26)
We can extract the single column from the dataset
rawcsv_NAsremoved_thedesiredvector[[1]]
If we need to use tidyverse
library(tidyverse)
rownames_to_column(rawcsv, 'rn') %>%
gather(key, value, -rn, na.rm = TRUE) %>%
arrange(as.integer(rn)) %>%
pull(value)

Name the column of data frame and set as factor at the same time

I need your help to simplify the following code.
I need to name the columns of matrix and format each of it as factor.
How can I do that for 100 columns without doing it one by one.
z <- matrix(sample(seq(3),n*p,replace=TRUE),nrow=n)
train.data <- data.frame(x1=factor(z[,1],x2=factor(z[,2],....,x100=factor(z[,52]))
Here's one option
setNames(data.frame(lapply(split(z, col(z)), factor)), paste0("x", 1:p))
or use magrittr piping syntax
library(magrittr)
split(z, col(z)) %>%
lapply(factor) %>%
data.frame %>%
setNames(paste0("x", 1:p))

Resources