tidyjson: Replace existing column values in dataframe with output of spread_values

tidyjson: Replace existing column values in dataframe with output of spread_values - r

I have a data frame that has a column with JSON values. I found the library "tidyjson" which helps to extract this JSON. However, it is always extracted into a new data frame.
I am looking for a way to replace the JSON in the original data frame with the result of tidyjson.
Code:
mydf <- df$response %>% as.tbl_json %>% gather_array %>%
spread_values(text=jstring('text'))
Is there a way that "df$response" is replaced with the extracted json "text"-value?
Thanks in advance!

This solution worked for me:
df %>% as.tbl_json(json.column = 'response') %>% gather_array %>%
spread_values(response=jstring('text'))

Related

dataframe in wideformat to dataframe of timeseries

I am currently struggling with reshaping my dataset to a preferred result. Lets say I have the following dataset to start with:
library(tsbox)
library(dplyr)
library(tidyr)
# create df that matches my format
df1 <- ts_wide(ts_df(ts_c(mdeaths)))
df1$id <- 1
df2 <- ts_wide(ts_df(ts_c(mdeaths)))
df2$id <- 2
df <- rbind(df1, df2)
Now this dataset has a date column, a value column and an "id" column, which should specifiy which date/value points belong to the same observation object. I would now like to reshape my dataset to a 2x2 dataframe, where the first column is the id, while the second column is a timeseries object (of the date/value corresponding to that id). To do so, I tried the following:
# create a new df, with two cols (id and ts)
df_ts <- df %>%
group_by(id) %>%
nest()
The nest command creates a "a list-column of data frames", which is not exactly what I wanted. I know that a ts can be defined via ts(data$value, data$date), but I do not know how to integrate it after the group_by(id) function. Can anyone help me how to turn this column into a ts object instead of a data frame? I am new to R and grateful for any form of help.
Thanks in advance

If you have a non-atomic data type it will have to be a list column of something.
If you want a list-column of ts object you can:
df %>%
group_by(id) %>%
summarize(ts = list(ts(value, time)))
Continuing your pipe you could:
df %>%
group_by(id) %>%
nest() %>%
mutate(data = purrr::map(data, with, ts(value, time)))

R - Type Casting With Map()

I would like to create a new column that extracts the hour from a timestamp as a numeric data type. If I had one data frame or tibble, I would do it as follows:
calories_hourly$activity_hour_num <- calories_hourly$activity_hour %>% mdy_hms() %>% format(format = ('%H')) %>% as.numeric()
However, I have one list of 18 tibbles called "fitbit_data" where I would like to perform the operation above for tibbles 6-16. The type casting is calculated from the second column in all of my tibbles. I have an example of the beginning of a failed attempt below:
fitbit_data[6:16] <- fitbit_data[6:16] %>% mutate(activity_hour_num=map(.x=fitbit_data[6:16], .f=~mdy(.x[2])))
Can you please help me code a tidy solution for this R task?
Thank you so much!

You can use map as -
library(purrr)
library(lubridate)
library(dplyr)
k <- 6:16
fitbit_data[k] <- map(fitbit_data[k], ~{.x[[2]] <- lubridate::mdy(.x[[2]]);.x})
Based on the first attempt you can do -
fitbit_data[k] <- map(fitbit_data[k], ~.x %>%
mutate(activity_hour = mdy_hms(activity_hour) %>%
format('%H') %>% as.numeric()))

load JSON data into a dataframe

I am a beginner working with R and especially JSON files, and this is probably a simple question but I have been unsuccessful for a while.
Here is a sample row of data from a provided text file (there are ~4000 rows):
{"040070005001":4,"040070005003":4,"040138101003":4,"040130718024":4}
Each row has a variable number of values in the string.
I am trying to use a function, but it is only loading the last row of the data set rather than capturing the data from each row?
For (row in 1:nrow(origins)) {
json <- origins$home_cbgs[row] %>%
fromJSON() %>%
unlist() %>%
as.data.frame() %>%
rownames_to_column() %>%
rename(
origin_census_block_group = "rowname",
origin_visitors = "."
)
}

R, Remove duplicate rows conditional on value of variable

this is my first post here. I have a large dataset and I am trying to remove duplicate rows based on the value of one of the specified variables (ERRaw). When I use the following code, the resulting dataset excludes some cases that did not have duplicates in the original -- don't understand why. I need to keep all singleton cases and only remove duplicates. Please help!
new_data <- data_with_dups %>%
group_by(StudentID, District) %>%
distinct(StudentID, ERRaw, .keep_all = T) %>%
top_n(1, ERRaw)
Thank you!

I think any of these should work. If you provide copy/pasteable sample data, I'll test and make sure.
# group_by and top_n
new_data <- data_with_dups %>%
group_by(StudentID, District) %>%
arrange(desc(ERRaw)) %>%
top_n(1)
# base R sort, !duplicated
new_data = data_with_dups[order(data_with_dups$ERRaw, decreasing = TRUE), ]
new_data = new_data[!duplicated(new_data[c("StudentID", "District")]), ]

R dplyr group_by subject appears to use entire dataframe instead of subject

Background
I am working with a large dataset from a repeated measures clinical trial in R, where I want to do some data manipulations for each subject. This could be extraction of the max value in column x for each subject or the mean of column y for each subject.
Problem
I am fond of using the dplyr package and pipes, which led me to the group_by function. But when I try to apply it, the data that I want to extract does not seem to group by subject as it is supposed to, but rather extracts data based on the entire dataset.
Code
This is what I have done so far:
data <- read.csv(file="group_by_question.csv", header=TRUE, sep=",")
library(dplyr)
library(plyr)
data <- tbl_df(data)
test <- data %>%
filter(!is.na(wght)) %>%
dplyr::group_by(subject_id) %>%
mutate(maxwght=max(wght),meanwght=mean(wght)) %>%
ungroup()
Sample of the test dataframe:
Find a .csv sample of my dataset here:
https://drive.google.com/file/d/1wGkSQyJXqSswThiNsqC26qaP7d3catyX/view?usp=sharing

Is this what you want? In my example below, the output shows the max value for the maxwght column by subject id. You could replace max() with mean, for example, if you require the mean value for maxwght for each subject id.
library(dplyr)
data <- read.csv(file="group_by_question.csv", header=TRUE, sep=",")
test <- data %>%
filter(!is.na(wght)) %>%
mutate(maxwght=max(wght),meanwght=mean(wght)) %>%
group_by(subject_id) %>%
summarise(value = max(maxwght)) %>%
ungroup()

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

tidyjson: Replace existing column values in dataframe with output of spread_values - r

This solution worked for me: df %>% as.tbl_json(json.column = 'response') %>% gather_array %>% spread_values(response=jstring('text'))

Related

dataframe in wideformat to dataframe of timeseries

R - Type Casting With Map()

load JSON data into a dataframe

R, Remove duplicate rows conditional on value of variable

R dplyr group_by subject appears to use entire dataframe instead of subject

Categories

Resources