Related
This question already has answers here:
Calculate difference between values in consecutive rows by group
(4 answers)
Closed 1 year ago.
I want to find the difference in the values of the same type.
Please refer to the sample dataframe below:
df <- data.frame(
x = c("Jimmy Page","Jimmy Page","Jimmy Page","Jimmy Page", "John Smith", "John Smith", "John Smith", "Joe Root", "Joe Root", "Joe Root", "Joe Root", "Joe Root"),
y = c(1,2,3,4,5,7,89,12,34,67,95,9674 )
)
I would like to get the difference in the each value for e.g. Jimmy Page = 1 and Jimmy Page = 2, difference = 1.
And present NA for difference between dissimilar names.
You can use diff in ave.
df$diff <- ave(df$y, df$x, FUN=function(z) c(diff(z), NA))
df
# x y diff
#1 Jimmy Page 1 1
#2 Jimmy Page 2 1
#3 Jimmy Page 3 1
#4 Jimmy Page 4 NA
#5 John Smith 5 2
#6 John Smith 7 82
#7 John Smith 89 NA
#8 Joe Root 12 22
#9 Joe Root 34 33
#10 Joe Root 67 28
#11 Joe Root 95 9579
#12 Joe Root 9674 NA
library(tidyverse)
df <-
data.frame(
x = c(
"Jimmy Page",
"Jimmy Page",
"Jimmy Page",
"Jimmy Page",
"John Smith",
"John Smith",
"John Smith",
"Joe Root",
"Joe Root",
"Joe Root",
"Joe Root",
"Joe Root"
),
y = c(1, 2, 3, 4, 5, 7, 89, 12, 34, 67, 95, 9674)
)
df %>%
group_by(x) %>%
mutate(res = c(NA, diff(y))) %>%
ungroup()
#> # A tibble: 12 x 3
#> x y res
#> <chr> <dbl> <dbl>
#> 1 Jimmy Page 1 NA
#> 2 Jimmy Page 2 1
#> 3 Jimmy Page 3 1
#> 4 Jimmy Page 4 1
#> 5 John Smith 5 NA
#> 6 John Smith 7 2
#> 7 John Smith 89 82
#> 8 Joe Root 12 NA
#> 9 Joe Root 34 22
#> 10 Joe Root 67 33
#> 11 Joe Root 95 28
#> 12 Joe Root 9674 9579
Created on 2021-09-14 by the reprex package (v2.0.1)
I have a pretty good understanding of R but am new to JSON file types and best practices for parsing. I'm having difficulties building a data frame from a raw JSON file. The JSON file (data below) is made up of repeated measure data that has multiple observations per user.
When the raw file is read into r
jdata<-read_json("./raw.json")
It comes in as a "List of 1" with that list being user_ids. Within each user_id are further lists, like so -
jdata$user_id$`sjohnson`$date$`2020-09-25`$city
The very last position actually splits into two options - $city or $zip. At the highest level, there are about 89 users in the complete file.
My goal would be to end up with a rectangular data frame or multiple data frames that I can merge together like this - where I don't actually need the zip code.
example table
I've tried jsonlite along with tidyverse and the farthest I seem to get is a data frame with one variable at the smallest level - cities and zip codes alternating rows
using this
df <- as.data.frame(matrix(unlist(jdata), nrow=length(unlist(jdata["users"]))))
Any help/suggestions to get closer to the table above would be much appreciated. I have a feeling I'm failing at looping it back through the different levels.
Here is an example of the raw json file structure:
{
"user_id": {
"sjohnson": {
"date": {
"2020-09-25": {
"city": "Denver",
"zip": "80014"
},
"2020-10-01": {
"city": "Atlanta",
"zip": "30301"
},
"2020-11-04": {
"city": "Jacksonville",
"zip": "14001"
}
},
"asmith: {
"date": {
"2020-10-16": {
"city": "Cleavland",
"zip": "34321"
},
"2020-11-10": {
"City": "Elmhurst",
"zip": "00013
},
"2020-11-10 08:49:36": {
"location": null,
"timestamp": 1605016176013
}
}
Another (straightforward) solution doing the heavy-lifting with rrapply() in the rrapply-package:
library(rrapply)
library(dplyr)
rrapply(jdata, how = "melt") %>%
filter(L5 == "city") %>%
select(user_id = L2, date = L4, city = value)
#> user_id date city
#> 1 sjohnson 2020-09-25 Denver
#> 2 sjohnson 2020-10-01 Atlanta
#> 3 sjohnson 2020-11-04 Jacksonville
#> 4 asmith 2020-10-16 Cleavland
#> 5 asmith 2020-11-10 Elmhurst
Data
jdata <- jsonlite::fromJSON('{
"user_id": {
"sjohnson": {
"date": {
"2020-09-25": {
"city": "Denver",
"zip": "80014"
},
"2020-10-01": {
"city": "Atlanta",
"zip": "30301"
},
"2020-11-04": {
"city": "Jacksonville",
"zip": "14001"
}
}
},
"asmith": {
"date": {
"2020-10-16": {
"city": "Cleavland",
"zip": "34321"
},
"2020-11-10": {
"city": "Elmhurst",
"zip": "00013"
},
"2020-11-10 08:49:36": {
"location": null,
"timestamp": 1605016176013
}
}
}
}
}')
We can build our desired structure step by step:
library(jsonlite)
library(tidyverse)
df <- fromJSON('{
"user_id": {
"sjohnson": {
"date": {
"2020-09-25": {
"city": "Denver",
"zip": "80014"
},
"2020-10-01": {
"city": "Atlanta",
"zip": "30301"
},
"2020-11-04": {
"city": "Jacksonville",
"zip": "14001"
}
}
},
"asmith": {
"date": {
"2020-10-16": {
"city": "Cleavland",
"zip": "34321"
},
"2020-11-10": {
"city": "Elmhurst",
"zip": "00013"
},
"2020-11-10 08:49:36": {
"location": null,
"timestamp": 1605016176013
}
}
}
}
}')
df %>%
bind_rows() %>%
pivot_longer(everything(), names_to = 'user_id') %>%
unnest_longer(value, indices_to = 'date') %>%
unnest_longer(value, indices_to = 'var') %>%
mutate(city = unlist(value)) %>%
filter(var == 'city') %>%
select(-var, -value)
which gives:
# A tibble: 5 x 3
user_id date city
<chr> <chr> <chr>
1 sjohnson 2020-09-25 Denver
2 sjohnson 2020-10-01 Atlanta
3 sjohnson 2020-11-04 Jacksonville
4 asmith 2020-10-16 Cleavland
5 asmith 2020-11-10 Elmhurst
Alternative solution inspired by #Greg where we change the last two rows:
df %>%
bind_rows() %>%
pivot_longer(everything(), names_to = 'user_id') %>%
unnest_longer(value, indices_to = 'date') %>%
unnest_longer(value, indices_to = 'var') %>%
mutate(value = unlist(value)) %>%
pivot_wider(names_from = "var") %>%
select(user_id, date, city)
This gives almost the same results with the exception of one additional case where city is NA:
# A tibble: 6 x 3
user_id date city
<chr> <chr> <chr>
1 sjohnson 2020-09-25 Denver
2 sjohnson 2020-10-01 Atlanta
3 sjohnson 2020-11-04 Jacksonville
4 asmith 2020-10-16 Cleavland
5 asmith 2020-11-10 Elmhurst
6 asmith 2020-11-10 08:49:36 NA
Here's a solution in the tidyverse: a custom function unnestable() designed to recursively unnest into a table the contents of a list like you describe. See Details for particulars regarding the format of such a list and its table.
Solution
First ensure the necessary libraries are present:
library(jsonlite)
library(tidyverse)
Then define the unnestable() function as follows:
unnestable <- function(v) {
# If we've reached the bottommost list, simply treat it as a table...
if(all(sapply(
X = v,
# Check that each element is a single value (or NULL).
FUN = function(x) {
is.null(x) || purrr::is_scalar_atomic(x)
},
simplify = TRUE
))) {
v %>%
# Replace any NULLs with NAs to preserve blank fields...
sapply(
FUN = function(x) {
if(is.null(x))
NA
else
x
},
simplify = FALSE
) %>%
# ...and convert this bottommost list into a table.
tidyr::as_tibble()
}
# ...but if this list contains another nested list, then recursively unnest its
# contents and combine their tabular results.
else if(purrr::is_scalar_list(v)) {
# Take the contents within the nested list...
v[[1]] %>%
# ...apply this 'unnestable()' function to them recursively...
sapply(
FUN = unnestable,
simplify = FALSE,
USE.NAMES = TRUE
) %>%
# ...and stack their results.
dplyr::bind_rows(.id = names(v)[1])
}
# Otherwise, the format is unrecognized and yields no results.
else {
NULL
}
}
Finally, process the JSON data as follows:
# Read the JSON file into an R list.
jdata <- jsonlite::read_json("./raw.json")
# Flatten the R list into a table, via 'unnestable()'
flat_data <- unnestable(jdata)
# View the raw table.
flat_data
Naturally, you can reformat this table however you desire:
library(lubridate)
flat_data <- flat_data %>%
dplyr::transmute(
user_id = as.character(user_id),
date = lubridate::as_datetime(date),
city = as.character(city)
) %>%
dplyr::distinct()
# View the reformatted table.
flat_data
Results
Given a raw.json file like that sampled here
{
"user_id": {
"sjohnson": {
"date": {
"2020-09-25": {
"city": "Denver",
"zip": "80014"
},
"2020-10-01": {
"city": "Atlanta",
"zip": "30301"
},
"2020-11-04": {
"city": "Jacksonville",
"zip": "14001"
}
}
},
"asmith": {
"date": {
"2020-10-16": {
"city": "Cleavland",
"zip": "34321"
},
"2020-11-10": {
"city": "Elmhurst",
"zip": "00013"
},
"2020-11-10 08:49:36": {
"location": null,
"timestamp": 1605016176013
}
}
}
}
}
then unnestable() will yield a tibble like this
# A tibble: 6 x 6
user_id date city zip location timestamp
<chr> <chr> <chr> <chr> <lgl> <dbl>
1 sjohnson 2020-09-25 Denver 80014 NA NA
2 sjohnson 2020-10-01 Atlanta 30301 NA NA
3 sjohnson 2020-11-04 Jacksonville 14001 NA NA
4 asmith 2020-10-16 Cleavland 34321 NA NA
5 asmith 2020-11-10 Elmhurst 00013 NA NA
6 asmith 2020-11-10 08:49:36 NA NA NA 1605016176013
which dplyr will format into the result below:
# A tibble: 6 x 3
user_id date city
<chr> <dttm> <chr>
1 sjohnson 2020-09-25 00:00:00 Denver
2 sjohnson 2020-10-01 00:00:00 Atlanta
3 sjohnson 2020-11-04 00:00:00 Jacksonville
4 asmith 2020-10-16 00:00:00 Cleavland
5 asmith 2020-11-10 00:00:00 Elmhurst
6 asmith 2020-11-10 08:49:36 NA
Details
List Format
To be precise, the list represents nested groupings by the fields {group_1, group_2, ..., group_n}, and it must be of the form:
list(
group_1 = list(
"value_1" = list(
group_2 = list(
"value_1.1" = list(
# .
# .
# .
group_n = list(
"value_1.1.….n.1" = list(
field_a = 1,
field_b = TRUE
),
"value_1.1.….n.2" = list(
field_a = 2,
field_c = "2"
)
# ...
)
),
"value_1.2" = list(
# .
# .
# .
)
# ...
)
),
"value_2" = list(
group_2 = list(
"value_2.1" = list(
# .
# .
# .
group_n = list(
"value_2.1.….n.1" = list(
field_a = 3,
field_d = 3.0
)
# ...
)
),
"value_2.2" = list(
# .
# .
# .
)
# ...
)
)
# ...
)
)
Table Format
Given a list of this form, unnestable() will flatten it into a table of the following form:
# A tibble: … x …
group_1 group_2 ... group_n field_a field_b field_c field_d
<chr> <chr> ... <chr> <dbl> <lgl> <chr> <dbl>
1 value_1 value_1.1 ... value_1.1.….n.1 1 TRUE NA NA
2 value_1 value_1.1 ... value_1.1.….n.2 2 NA 2 NA
3 value_1 value_1.2 ... value_1.2.….n.1 ... ... ... ...
⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮
j value_2 value_2.1 ... value_2.1.….n.1 3 NA NA 3
⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮
k value_2 value_2.2 ... value_2.2.….n.1 ... ... ... ...
⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮
I'm using Vis.js timeline first time I want timeline year wise instead of combined year timeline. I tried groups options in Vis.js Instead of items but didn't work.
After page refresh i'm getting timeline like this:
But I want timeline like this:
Can you please help me out from this problem?
Thanks
Code:
var container = document.getElementById('visualization');
// Create a DataSet (allows two way data-binding)
var items = new vis.DataSet(
[
{
"content": "Application 31 August 2004 - 0.0 ",
"start": "2004-08-31",
"id": 0
},
{
"content": "cricket 10 October 2007 - 3.11 Years",
"start": "2007-10-10",
"id": 1
},
{
"content": "Inter 09 January 2008 - 3.36 Years",
"start": "2008-01-09",
"id": 2
},
{
"content": "Final 09 April 2008 - 3.61 Years",
"start": "2008-04-09",
"id": 3
},
{
"content": "exam 07 July 2008 - 3.85 Years",
"start": "2008-07-07",
"id": 4
},
{
"content": "asf 18 July 2008 - 3.88 Years",
"start": "2008-07-18",
"code": "all",
"id": 5
},
{
"content": "pal 01 August 2008 - 3.92 Years",
"start": "2008-08-01",
"id": 6
},
{
"content": "Final 08 January 2009 - 4.36 Years",
"start": "2009-01-08",
"id": 7
},
{
"content": "App 01 June 2009 - 4.75 Years",
"start": "2009-06-01",
"id": 8
},
{
"content": "N 31 August 2009 - 5.0 Years",
"start": "2009-08-31",
"id": 9
},
{
"content": "Fl 09 March 2010 - 5.52 Years",
"start": "2010-03-09",
"id": 10
},
{
"content": "Request 10 June 2010 - 5.78 Years",
"start": "2010-06-10",
"id": 11
},
{
"content": "Abn 15 June 2010 - 5.79 Years",
"start": "2010-06-15",
"id": 12
},
{
"content": "Non-Final 17 November 2010 - 6.22 Years",
"start": "2010-11-17",
"id": 13
},
{
"content": "Final R13 April 2011 - 6.62 Years",
"start": "2011-04-13",
"id": 14
},
{
"content": "App 07 September 2011 - 7.02 Years",
"start": "2011-09-07",
"id": 15,
}
]
);
// Configuration for the Timeline
var options = {
min: new Date(2000, 1, 5),
max: new Date(209,3,2),
// autoResize: false,
height: '200px'
};
// Create a Timeline
var timeline = new vis.Timeline(container, items, options);
You can use options as {stack:false} for making as same.
I have a tibble and a list which I would like to write to a json file.
# A tibble: 2 x 12
i n c x
<chr> <chr> <chr> <chr>
1 NYC New York City United States LON,271;BOS,201
2 LON London United Kingdom NYC,270
I would like to replace the 'x' column with a list.
When I try to merge by the 'i' column with the element of the list, a lot of data is duplicated... :/
sample list:
$NYC
d p
1: LON 271
2: BOS 201
$LON
d p
1: NYC 270
I would like to end up with something that looks like this:
[
{
"i": "NYC",
"n": "New York City",
"c": "United States",
"C": "US",
"r": "Northern America",
"F": 66.256,
"L": -166.063,
"b": 94.42,
"s": 0.752,
"q": 4417,
"t": "0,0,0,0,0",
"x": [{
"d": "LON",
"p": 271
},
{
"d": "BOS",
"p": 201
}]
}
...
]
I'm thinking there should be a way to write the json file without merging the list and the tibble, or maybe there is a way to merge them in a ragged way ?
ah. I just had another idea. maybe I can convert my dataframe to a list then use Reduce to combine the lists...
http://www.sharecsv.com/s/2e1dc764430c6fe746d2299f71879c2e/routes-before-split.csv
http://www.sharecsv.com/s/b114e2cc6236bd22b23298035fb7e042/tibble.csv
We may do the following:
tbl
# A tibble: 1 x 13
# X i n c C r F L b s q t x
# <int> <fct> <fct> <fct> <fct> <fct> <dbl> <dbl> <dbl> <dbl> <int> <fct> <fct>
# 1 1 LON London United Kingd… GB Northern Eur… 51.5 -0.127 55.4 1.25 2088 0,0,1,3… AAL,15;AAR,15;A…
require(tidyverse)
tbl$x <- map(tbl$x, ~ strsplit(., ";|,")[[1]] %>%
{data.frame(d = .[c(T, F)], p = as.numeric(.[c(F, T)]))})
The latter two lines are a shortened version of this base R equivalent:
tbl$x <- lapply(tbl$x, function(r) {
tmp <- strsplit(r, ";|,")[[1]]
data.frame(d = tmp[seq(1, length(tmp), 2)],
p = as.numeric(tmp[seq(2, length(tmp), 2)]))
})
We go over the x column, split its elements by ; and , whenever possible, and then use the fact that the resulting odd elements will correspond do the d column in the desired outcome, and the even elements to the p column.
Output:
toJSON(tbl, pretty = TRUE)
[
{
"X": 1,
"i": "LON",
"n": "London",
"c": "United Kingdom",
"C": "GB",
"r": "Northern Europe",
"F": 51.508,
"L": -0.127,
"b": 55.43,
"s": 1.25,
"q": 2088,
"t": "0,0,1,3,1",
"x": [
{
"d": "AAL",
"p": 15
},
{
"d": "AAR",
"p": 15
},
{
"d": "ABZ",
"p": 48
}
]
}
]
I am trying to get a list of named numbers to be a data.frame for easier plotting in ggplot2. My list looks like this:
dat <- list()
dat[[1]] <- c( 816, 609, 427, 426, 426, 419, 390, 353, 326, 301)
dat[[2]] <- c(96, 95, 94, 74, 66, 59, 51, 50, 43, 42)
dat[[3]] <- c(2219, 1742, 1689, 1590, 995, 823, 587, 562, 554, 535)
names(dat[[1]]) <-
c("new york city", "new york times", "amazon services llc", "services llc amazon",
"llc amazon eu", "couple weeks ago", "incorporated item pp", "two years ago",
"new york n.y", "world war ii")
names(dat[[2]]) <-
c("new york city", "president barack obama", "two years ago" ,
"st louis county", "gov chris christie", "first time since" ,
"world war ii", "three years ago", "new york times", "four years ago")
names(dat[[3]]) <-
c("let us know", "happy mothers day", "happy new year",
"happy mother's day", "cinco de mayo", "looking forward seeing",
"just got back", "keep good work", "come see us", "love love love")
names(dat) <- c("blogs","news","twitter")
dat
I have tried to unlist() this data, and I know there is a simple way to do this. Perhaps in data.table or dplyr. But I always get funny results.
The desired form is:
dat1 <- data.frame(ngram = c("new york city", "new york times", "amazon services llc", "services llc amazon",
"llc amazon eu", "couple weeks ago", "incorporated item pp", "two years ago",
"new york n.y", "world war ii"),
freq = c( 816, 609, 427, 426, 426, 419, 390, 353, 326, 301),
text = c("Blogs"))
dat2 <- data.frame(ngram = c("new york city", "president barack obama", "two years ago" ,
"st louis county", "gov chris christie", "first time since" ,
"world war ii", "three years ago", "new york times", "four years ago"),
freq = c(96, 95, 94, 74, 66, 59, 51, 50, 43, 42),
text = "News")
dat3 <- data.frame(ngram = c("let us know", "happy mothers day", "happy new year",
"happy mother's day", "cinco de mayo", "looking forward seeing",
"just got back", "keep good work", "come see us", "love love love"),
freq = c(2219, 1742, 1689, 1590, 995, 823, 587, 562, 554, 535),
text = "Twitter")
dat <- rbind(dat1,dat2,dat3)
dat
Maybe
purrr::map_dfr(.x = dat,tibble::enframe,.id = "text")
# A tibble: 30 x 3
text name value
<chr> <chr> <dbl>
1 blogs new york city 816
2 blogs new york times 609
3 blogs amazon services llc 427
4 blogs services llc amazon 426
5 blogs llc amazon eu 426
6 blogs couple weeks ago 419
7 blogs incorporated item pp 390
8 blogs two years ago 353
9 blogs new york n.y 326
10 blogs world war ii 301
# ... with 20 more rows
Still need to rename two variables, but I think that's pretty close?
A solution using do.call and gather :
library(tidyverse)
do.call(cbind, dat) %>% as.data.frame() %>% rownames_to_column("ngram") %>%
gather(text, freq, - ngram)
# ngram text freq
# 1 new york city blogs 816
# 2 new york times blogs 609
# 3 amazon services llc blogs 427
# 4 services llc amazon blogs 426
# 5 llc amazon eu blogs 426
# 6 couple weeks ago blogs 419
# 7 incorporated item pp blogs 390
# 8 two years ago blogs 353
# 9 new york n.y blogs 326
# 10 world war ii blogs 301
# 11 new york city news 96
# 12 new york times news 95
# 13 amazon services llc news 94
# 14 services llc amazon news 74
# 15 llc amazon eu news 66
# 16 couple weeks ago news 59
# 17 incorporated item pp news 51
# 18 two years ago news 50
# 19 new york n.y news 43
# 20 world war ii news 42
# 21 new york city twitter 2219
# 22 new york times twitter 1742
# 23 amazon services llc twitter 1689
# 24 services llc amazon twitter 1590
# 25 llc amazon eu twitter 995
# 26 couple weeks ago twitter 823
# 27 incorporated item pp twitter 587
# 28 two years ago twitter 562
# 29 new york n.y twitter 554
# 30 world war ii twitter 535