I have a data.frame df that consists of fours sites (1 to 4). Each site has values for four parameters (A to D) from 2011 to 2014. I want to create a motion chart for site1.
library(dplyr)
siteID <- c(rep("site1", 16), rep("site2", 16), rep("site3", 16), rep("site4", 16))
YEAR <- as.numeric(rep(c("2011", "2012", "2013", "2014"), 16))
parameter <- c(rep("A", 4), rep("B", 4), rep("C", 4), rep("D", 4),
rep("A", 4), rep("B", 4), rep("C", 4), rep("D", 4),
rep("A", 4), rep("B", 4), rep("C", 4), rep("D", 4),
rep("A", 4), rep("B", 4), rep("C", 4), rep("D", 4))
value <- c(seq(1, 4, by=1), seq(10, 40, by=10), seq(12, 18, by=2), seq(5, 20, by=5),
seq(3, 12, by=3), sample(13:18, 4), sample(15:22, 4), sample(10:18, 4),
seq(7, 1, by=-2), sample(15:22, 4), sample(15:19, 4), sample(10:20, 4),
seq(8, 5, by=-1), seq(50, 20, by=-10), seq(16, 10, by=-2), seq(20, 5, by=-5))
df <- data.frame(siteID, YEAR, parameter, value)
df$YEAR <- as.numeric(df$YEAR)
df1 <- df %>%
dplyr::filter(siteID =="site1")
I created the motion chart for site 1 using the following code
library(googleVis)
site1 = gvisMotionChart(data=df1,
idvar="parameter",
timevar="YEAR",
chartid="site1")
plot(site1)
It worked fine. The result is here
However, the default x axis and y axis were value. I had to change x axis myself from value to YEAR.
I wanted to change the default values so that x-axis will be YEAR, colorvar will be parameter, and sizevar will be value. I did that using this code
site1_1 = gvisMotionChart(data=df1,
idvar="parameter",
timevar="YEAR",
chartid="site1",
xvar="YEAR",
yvar="value",
colorvar="parameter",
sizevar="value")
plot(site1_1)
It kept showing as loading but the plot was not created.
Any suggestions would be appreciated.
I think the below should get you just about there. All that's left is to set the options appropriately to get rid of the commas and such.
df1 <- df %>%
dplyr::filter(siteID =="site1") %>%
mutate(Date = YEAR) %>%
mutate(colorValue = parameter) %>%
mutate(sizeValue = value)
library(googleVis)
site1 = gvisMotionChart(data=df1,
idvar="parameter",
timevar="YEAR",
chartid="site1",
xvar = "Date",
yvar = "value",
colorvar = "colorValue",
sizevar = "sizeValue")
plot(site1)
Related
I have the following three data.frame:
area1 <- data.frame(ua = c(1, 2, 3),
sub_ua1 = c(0, 100, 0),
sub_ua2 = c(100, 100, 100),
sub_ua3 = c(100, 0, 0))
area2 <- data.frame(ua = c(1, 2, 3),
sub_ua1 = c(100, 100, 0),
sub_ua2 = c(100, 100, 0),
sub_ua3 = c(100, 0, 0))
df <- data.frame(ua = c(rep(1, 5), rep(2, 4), rep(3, 7)),
subua = c(rep("sub_ua1", 3), "sub_ua2", "sub_ua3",
"sub_ua1", "sub_ua1", "sub_ua2", "sub_ua3",
"sub_ua1", c(rep("sub_ua2", 2)), rep("sub_ua3", 4)),
value = c(rep(2, 3), rep(4, 3), rep(2, 2), rep(1, 8)))
What I'm trying to do is, based on column ua in dfs area_1 and area_2, filter only sub_ua (1 to 3) that have a match of 100 in each df. For example, the first value of sub_ua2 is 100 in both area_1 and area_2. This is a "sub_ua" I want.
Then, after having this list of "sub_ua" per "ua", filter only them on df to obtain the filtered value.
The results should be:
For ua == 1, get both sub_ua2 and sub_ua3
For ua == 2, get both sub_ua1 and sub_ua2
For ua == 3, get sub_ua2
EDIT:
I was using the following approach to obtain a data.frame of rows and columns indices:
library(prodlim)
# Indices for data frame 1 and 2 for values = 100
indices_1 <- which(area1 == 100, arr.ind = TRUE)
indices_2 <- which(area2 == 100, arr.ind = TRUE)
# Rows where indices are matched between the two data frame indices
indices_rows <- na.omit(row.match(as.data.frame(indices_1), as.data.frame(indices_2)))
# Row-column indices where both data frames have values of 100
indices_2[indices_rows, ]
I just don't know how to use this to filter in the final dataset df
If I understood correctly this should work:
area1 <- data.frame(ua = c(1, 2, 3),
sub_ua1 = c(0, 100, 0),
sub_ua2 = c(100, 100, 100),
sub_ua3 = c(100, 0, 0))
area2 <- data.frame(ua = c(1, 2, 3),
sub_ua1 = c(100, 100, 0),
sub_ua2 = c(100, 100, 0),
sub_ua3 = c(100, 0, 0))
library(dplyr)
library(tidyr)
area1 %>%
left_join(area2, by = "ua", suffix = c(".area1",".area2")) %>%
pivot_longer(cols = -ua,names_to = "var",values_to = "value") %>%
separate(col = var,into = c("var","area"),sep = "\\.") %>%
pivot_wider(names_from = area,values_from = value) %>%
filter(area1 == 100, area2 == 100) %>%
select(-starts_with("area"))
# A tibble: 4 x 2
ua var
<dbl> <chr>
1 1 sub_ua2
2 1 sub_ua3
3 2 sub_ua1
4 2 sub_ua2
I am building a crime report in R and am comparing two separate dataframes, one from the current year and one from the previous year. The data structure is the same in both. Is there a way to color the values in a flextable based on the crimes that were committed the previous year? So, for example, if the month of January 2020 had more homicides than January 2019 then color that value red. If the month of January 2020 had less burglaries than January 2019 then color that value green, and so on for every month of the year and for every crime. Here is a sample of the data:
df2019 <- data.frame(crime = c("assault", "homicide", "burglary"),
Jan = c(5, 2, 7),
Feb = c(2, 4, 0),
Mar = c(1, 2, 1))
df2020 <- data.frame(crime = c("assault", "homicide", "burglary"),
Jan = c(1, 2, 5),
Feb = c(1, 3, 0),
Mar = c(2, 2, 1))
My desired output is the to have the df2020 values colored based on the df2019 values (I have included a picture below). I would then like to include the table in a Powerpoint using the Officer package.
Does anyone have any ideas? I have been exploring options in kable, kableExtra, and flextable but can't find any solutions that work across dataframes. Thanks for the help!
Here is a solution:
library(flextable)
library(magrittr)
df2019 <- data.frame(crime = c("assault", "homicide", "burglary"),
Jan = c(5, 2, 7),
Feb = c(2, 4, 0),
Mar = c(1, 2, 1))
df2020 <- data.frame(crime = c("assault", "homicide", "burglary"),
Jan = c(1, 2, 5),
Feb = c(1, 3, 0),
Mar = c(2, 2, 1))
colors <- unlist(df2020[-1] - df2019[-1]) %>%
cut(breaks = c(-Inf, -.1, 0.1, Inf),
labels = c("green", "transparent", "red")) %>%
as.character()
flextable(df2020) %>%
bg(j = ~ . -crime, bg = colors) %>%
theme_vanilla() %>%
autofit() %>% save_as_pptx(path = "test.pptx")
I have a data table of integer coordinates that align between two groups labelled A and B. For example:
dt_long <- data.table(LABEL_A = c(rep("A", 20), rep("A", 15), rep ("A", 10), rep ("A", 15), rep ("A", 10)),
SEQ_A = c(11:30, 61:75, 76:85, 86:100, 110:119),
LABEL_B= c(rep("C", 20), rep("D", 15), rep("F", 10), rep("G",15), rep("D", 10)),
SEQ_B = c(1:20, 25:11, 16:25, 15:1, 1:5, 8:12))
How can I add an ID column to this data.table with a unique id for each of the continuous aligned sequences? Each aligned sequence needs a separate ID if either SEQ_A or SEQ_B are not sequentially continuous, or if they belong to a different group (ie LABEL). For example:
dt_long_ID <- data.table(LABEL_A = c(rep("A", 20), rep("A", 15), rep ("A", 10), rep ("A", 15), rep ("A", 10)),
SEQ_A = c(11:30, 61:75, 76:85, 86:100, 110:119),
LABEL_B= c(rep("C", 20), rep("D", 15), rep("F", 10), rep("G",15), rep("D", 10)),
SEQ_B = c(1:20, 25:11, 16:25, 15:1, 1:5, 8:12),
ID = c(rep(1, 20), rep(2, 15), rep(3, 10), rep(4, 15), rep(5, 5), rep(6, 5) ))
Updated answer based on the clarified question and the updated data. This will work whether or not the LABEL columns are numeric.
# helper function for the sequential check
# the & !is.na() just corrects for the first NA value introduced by shift()
foo = function(x) cumsum(abs(x - shift(x)) > 1 & !is.na(shift(x)))
dt_long_ID[, ID2 := .GRP, by = .(rleid(LABEL_A), rleid(LABEL_B), foo(SEQ_A), foo(SEQ_B))]
all(dt_long_ID$ID == dt_long_ID$ID2)
# [1] TRUE
I have a list of dataframes that I want to consolidate these dataframes into one data frame. I am looking to solve two problems:
How to add together the columns
How to only include common dates across all the dfs withing the list
This is what I have:
library(tidyverse)
library(lubridate)
df1 <- data.frame(
date = ymd(c("2019-02-01", "2019-02-02", "2019-02-03", "2019-02-04",
"2019-02-05")),
x = c(1, 2, 3, 4, 5),
y = c(2, 3, 4, 5, 6),
z = c(3, 4, 5, 6, 7)
)
df2 <- data.frame(
date = ymd(c("2019-02-01", "2019-02-02", "2019-02-04", "2019-02-05")),
x = c(1, 2, 3, 4),
y = c(2, 3, 4, 5),
z = c(3, 4, 5, 6)
)
df3 <- data.frame(
date = ymd(c("2019-02-01", "2019-02-02", "2019-02-03", "2019-02-04")),
x = c(1, 2, 3, 4),
y = c(2, 3, 4, 5),
z = c(3, 4, 5, 6)
)
dfl <- list(df1, df2, df3)
This is the output I am looking for:
data.frame(
date = ymd(c("2019-02-01", "2019-02-02", "2019-02-04")),
x = c(3, 6, 11),
y = c(6, 9, 14),
z = c(9, 12, 17)
)
I have tried inner_join and tried looping through the list but it got too complicated and I still didn't manage to land on the answer.
Is there a more cleaner way to get to the final answer
How about this?
bind_rows(dfl) %>%
group_by(date) %>%
mutate(n = 1) %>%
summarise_all(sum) %>%
filter(n == length(dfl)) %>%
select(-n)
## A tibble: 3 x 4
# date x y z
# <date> <dbl> <dbl> <dbl>
#1 2019-02-01 3 6 9
#2 2019-02-02 6 9 12
#3 2019-02-04 11 14 17
This assumes that there are no duplicate dates in a single data.frame of dfl.
I have two dataframes
DataFrame1 <- data.frame(StudentId = c(1:20), Subject = c(rep("Algebra", 4), rep("Geometry", 4), rep("English", 4), rep("Zoology", 4), rep("Botany", 4)), CGPA = c(random::randomNumbers(20, 70, 100, 1)), Country = c(rep("USA", 4), rep("UK", 4), rep("Germany", 4), rep("France", 4), rep("Japan", 4)))
and
DataFrame2 <- data.frame(StudentId = c(1:10), State = c(rep("NYC", 2), rep("Illinois", 2), rep("Texas", 2), rep("Virginia", 2), rep("Florida", 2)), Age = c(random::randomNumbers(10, 16, 20, 1)), Gender = c(rep("Male", 3), rep("Female", 3), rep("Male", 2), rep("Female", 2)))
I can merge the above two using inner join as
merge(DataFrame1, DataFrame2)
How to merge as cross Joining two data frames without repeating values?
Try merge(DataFrame1, DataFrame2, all = T)
Try this for cross join..
knitr::kable(merge(x = DataFrame1, y = DataFrame2, by = NULL))