shift cell value form one time stamp to other in R - r

Is it possible to shift data of one cell in a column from one timestamp to other in a time series data without losing any other data? I have tried shift and slide functions but it replaces the data with NA values.
I have tried using mutate function as well but it changes the complete column.Is There any function or method to perform manipulation?
E.g, convert :
Date_Time | x | y
01-01-2016 | 1 | 2
02-01-2016 | 3 | 4
03-01-2016 | 5 | 6
04-01-2016 | 2 | 5
to:
Date_Time | x | y
01-01-2016 | 5 | 2
02-01-2016 | 3 | 4
03-01-2016 | 1 | 6
04-01-2016 | 2 | 5
or slide the data vertically
Date_Time | x | y
01-01-2016 | 2 | 2
02-01-2016 | 1 | 4
03-01-2016 | 3 | 6
04-01-2016 | 5 | 5

Two swap two values you need to hold one in a temporary variable. We can write a simple function:
swap = function(x, i, j) {
stopifnot(length(i) == length(j))
temp = x[i]
x[i] = x[j]
x[j] = temp
return(x)
}
On your data, it should work like this to give the desired result:
your_data$x = swap(your_data$x, which.min(your_data$x), which.max(your_data$x))

Two other options with dplyr:
library(dplyr)
df %>%
mutate(x = case_when(
x == max(x) ~ min(x),
x == min(x) ~ max(x),
TRUE ~ x
))
df %>%
mutate(x = replace(x, c(which.max(x), which.min(x)), c(min(x), max(x))))
Result:
Date_Time x y
1 01-01-2016 5 2
2 02-01-2016 3 4
3 03-01-2016 1 6
4 04-01-2016 2 5
To shift x vertically:
df %>%
mutate(x = c(x[-1], x[1]))
or
df %>%
mutate(x = c(x[length(x)], x[-length(x)]))
Result:
> df %>%
+ mutate(x = c(x[-1], x[1]))
Date_Time x y
1 01-01-2016 3 2
2 02-01-2016 5 4
3 03-01-2016 2 6
4 04-01-2016 1 5
> df %>%
+ mutate(x = c(x[length(x)], x[-length(x)]))
Date_Time x y
1 01-01-2016 2 2
2 02-01-2016 1 4
3 03-01-2016 3 6
4 04-01-2016 5 5
Data:
df = read.table(text = "Date_Time | x | y
01-01-2016 | 1 | 2
02-01-2016 | 3 | 4
03-01-2016 | 5 | 6
04-01-2016 | 2 | 5", header = TRUE, sep = "|")

Related

Create a pivot table with multiple hierarchical column groups

I'm trying to create a pivot table (to later be rendered in markdown). However, I can't find a way to produce multiple pivot columns.
my data:
| ID | group | var1 | var2 |
| -: |:-----:|:------:|:------:|
| 1 | A | 1 | 2 |
| 2 | B | 3 | 4 |
| 3 | C | 5 | 6 |
| 4 | A | 7 | 8 |
| 5 | B | 9 | 10 |
| 6 | C | 11 | 12 |
required table:
| | groupA | groupB | groupC |
| ID | var1 | var2 | var1 | var2 | var1 | var2 |
| -: |:------:|:------:|:------:|:------:|:------:|:------:|
| 1 | 1 | 2 | | | | |
| 2 | | | 3 | 4 | | |
| 3 | | | | | 5 | 6 |
| 4 | 7 | 8 | | | | |
| 5 | | | 9 | 10 | | |
| 6 | | | | | 11 | 12 |
Obviously the result is not a dataframe or a tibble.
How can such a table be created?
if this is your example data df:
df <- structure(list(ID = 1:6, group = c("A", "B", "C", "A", "B", "C"
), var1 = c(1, 3, 5, 7, 9, 11), var2 = c(2, 4, 6, 8, 10, 12)), class = "data.frame", row.names = c(NA,
-6L))
... you can generate the table structure and column headers like this:
library(tidyr)
df %>%
pivot_longer(cols = starts_with('var'),
names_to = 'var_name',
values_to = 'value'
) %>%
pivot_wider(id_cols = ID,
names_from = c('group', 'var_name'),
names_sep = '\n', ## wrap line after group name
values_from = 'value'
)
Note that AFAIK having the group names span the variable columns would require some separate fiddling between the steps of reshaping your data (see above) and producing the markdown.
Adding on #I_O data transformation, the header for the groups you could achieve with the kableExtra package, i.e.
library(dplyr)
library(tidyr)
library(kableExtra)
options(knitr.kable.NA = '')
df %>%
pivot_longer(cols = starts_with('var'),
names_to = 'var_name',
values_to = 'value'
) %>% pivot_wider(id_cols = ID,
names_from = c('group', 'var_name'),
names_sep = '\n', ## wrap line after group name
values_from = 'value'
) %>%
kbl(col.names = c("ID", "var1", "var2","var1", "var2","var1", "var2")) %>%
add_header_above(c(" ", "groupA" = 2,"groupB" = 2,"groupC" = 2 )) %>%
kable_styling(bootstrap_options = "striped", full_width = F)
Using reshape2
library(reshape2)
dcast(
melt(
df,
id.vars=c("grp1","grp2"),
measure.vars=c("var1","var2")
),
grp1~grp2+variable,
value.var="value"
)
grp1 A_var1 A_var2 B_var1 B_var2 C_var1 C_var2
1 1 1 2 NA NA NA NA
2 2 NA NA 3 4 NA NA
3 3 NA NA NA NA 5 6
4 4 7 8 NA NA NA NA
5 5 NA NA 9 10 NA NA
6 6 NA NA NA NA 11 12
There are two separate issues here. One is how to print a hierarchical table in R. There are a few ways to do this, mostly producing latex or html tables. For a hierarchical table printing in the R console, one option is to use tabular from the tables package:
library(tables)
library(dplyr)
fm <- function(x) if(length(x) == 0) "" else x
tabular( (ID) ~ group*(var1 + var2)*(`---`=fm),
data=mutate(df, ID = factor(ID), group = factor(group)))
#>
#> group
#> A B C
#> var1 var2 var1 var2 var1 var2
#> ID --- --- --- --- --- ---
#> 1 1 2
#> 2 3 4
#> 3 5 6
#> 4 7 8
#> 5 9 10
#> 6 11 12
The second, perhaps more important issue is how to store and work with hierarchical tabular structures. This is possible with nested tibbles. In your case, we can do something like:
library(tidyr)
nested_df <- complete(df, ID, group) %>%
nest_by(ID, group) %>%
pivot_wider(names_from = group, values_from = data)
nested_df
#> # A tibble: 6 x 4
#> ID A B C
#> <int> <list<tibble[,2]>> <list<tibble[,2]>> <list<tibble[,2]>>
#> 1 1 [1 x 2] [1 x 2] [1 x 2]
#> 2 2 [1 x 2] [1 x 2] [1 x 2]
#> 3 3 [1 x 2] [1 x 2] [1 x 2]
#> 4 4 [1 x 2] [1 x 2] [1 x 2]
#> 5 5 [1 x 2] [1 x 2] [1 x 2]
#> 6 6 [1 x 2] [1 x 2] [1 x 2]
To access, say, the var1 and var2 columns for group A we would do:
nested_df %>% select(A) %>% unnest(A)
# A tibble: 6 x 2
var1 var2
<dbl> <dbl>
1 1 2
2 NA NA
3 NA NA
4 7 8
5 NA NA
6 NA NA
Created on 2022-05-25 by the reprex package (v2.0.1)

How can I organize the code in a long format depending on which time of measurement

I have a question about converting a dataframe from a wide format into a long format. I haven't found any solutions that fit with my dataframe. We had three measurement timeslots with the same questionnaires (e.g. PANAS and two more questionnaires). My dataframe looks like this right now:
| code| PANAS_1| PANAS_2| PANAS1_1| PANAS1_2| PANAS2_1| PANAS2_2|
|CAPQ | 4 | 3 | 1 | 5 | 2 | 4 |
|BANI | 2 | 3 | 4 | 4 | 3 | 2 |
I want to put it into a format that looks like this:
| code| timeslot| PANAS_1| PANAS_2 |
|CAPQ | 1 | 4 | 3 |
|CAPQ | 2 | 1 | 5 |
|CAPQ | 3 | 2 | 4 |
|BANI | 1 | 2 | 3 |
|BANI | 2 | 4 | 4 |
|BANI | 3 | 3 | 2 |
I tried melt(), but I just don't know what to do because the variable names of the questionnaires aren't the same (the name of the variables in the first timeslot are plain "PANAS_1", the ones in the second timeslot begin with a 1 "PANAS1_1" and the ones in the third timeslot begin with a 2 "PANAS2_1). On top of that I have no variable that explains from what timeslot condition the items are.
I hope you can understand my problem and help me solve this. If you need further information, just let me know.
Here is an approach using data.table. With melt.data.table() you can use groups of measure.vars. In this case you can use patterns() to find the the groups by their suffix.
library(data.table)
df <- read.table(text = "code| PANAS_1| PANAS_2| PANAS1_1| PANAS1_2| PANAS2_1| PANAS2_2
CAPQ | 4 | 3 | 1 | 5 | 2 | 4
BANI | 2 | 3 | 4 | 4 | 3 | 2
", sep = "|", header = TRUE)
setDT(df)
DT.long <- melt(df,
id.vars = "code",
measure.vars = patterns("_1", "_2"),
variable.name = "timeslot",
value.name = c("PANAS_1", "PANAS_2")
)[order(code), ]
DT.long
#> code timeslot PANAS_1 PANAS_2
#> 1: BANI 1 2 3
#> 2: BANI 2 4 4
#> 3: BANI 3 3 2
#> 4: CAPQ 1 4 3
#> 5: CAPQ 2 1 5
#> 6: CAPQ 3 2 4
Created on 2021-08-19 by the reprex package (v2.0.1)
Here is one approach using tidyverse. You can use pivot_longer to put into long format, and separate out the last number after the underscore. Then, you can add a timeslot variable for each code/number combination, assuming the times are in order. Finally, you can revert to wide format with pivot_wider (or leave as is for further processing/analysis).
library(tidyverse)
df %>%
pivot_longer(cols = -code, names_to = c("var", "PANAS"), names_sep = "_") %>%
group_by(code, PANAS) %>%
mutate(timeslot = 1:n()) %>%
pivot_wider(id_cols = c(code, timeslot), names_from = PANAS, names_prefix = "PANAS_", values_from = value)
Output
code timeslot PANAS_1 PANAS_2
<chr> <int> <dbl> <dbl>
1 CAPQ 1 4 3
2 CAPQ 2 1 5
3 CAPQ 3 2 4
4 BANI 1 2 3
5 BANI 2 4 4
6 BANI 3 3 2
Alternatively, you can rename your column names and include the time inside them explicitly:
names(df) <- c("code", paste("PANAS", rep(1:3, each = 2), rep(1:2, times = 3), sep = "_"))
df %>%
pivot_longer(cols = -code, names_to = c("timeslot", "PANAS"), names_pattern = "PANAS_(\\d+)_(\\d+)") %>%
pivot_wider(id_cols = c(code, timeslot), names_from = PANAS, names_prefix = "PANAS_", values_from = value)

How to assign a number between 1 and n in R to rows?

I would like to assign individual in my data randomly to a group numbered 1 though 3, how would I do this? ( a DPLYR Solution is preferred), individuals (rows with the same id# must be in the same group)
_______________________
id # | group_id |
454452 | 1 |
5450441 | 2 |
5444531 | 3 |
5444531 | 3 |
5404501 | 1 |
5404041 | 2 |
5404041 | 2 |
254252 | 3 |
541254 | 2 |
_______________________
A simple solution might be:
df <- df %>% group_by(id) %>% mutate(group_id = sample(1:3,1))
which (using set.seed(12345)) resulted in:
id group_id
1 454452 3
2 5450441 1
3 5444531 2
4 5444531 2
5 5404501 2
6 5404041 3
7 5404041 3
8 254252 2
9 541254 2
Here's one option:
library(dplyr)
df <-
tibble(ids = c(100, 200, 200, 300, 300, 400))
distinct_ids <-
df %>%
select(ids) %>%
distinct() %>%
mutate(group_num = sample.int(3, size = nrow(.), replace = TRUE))
df %>%
left_join(distinct_ids, by = "ids")
# A tibble: 6 x 2
ids group_num
<dbl> <int>
1 100 3
2 200 1
3 200 1
4 300 3
5 300 3
6 400 2
In base R we could sample the factorized "id" and display them as.numeric.
set.seed(42) # for sake of reproducibility
dat <- transform(dat, group_id=as.numeric(factor(id, levels=sample(unique(dat$id)))))
dat
# id X1 X2 X3 group_id
# 1 454452 -1.1045994 0.0356312 1.93557177 1
# 2 5450441 0.5390238 1.3149588 1.72323080 5
# 3 5444531 0.5802063 0.9781675 0.35840206 6
# 4 5444531 -0.6575028 0.8817912 0.30243092 6
# 5 5404501 1.5548955 0.4822047 -0.39411451 7
# 6 5404041 -1.1876414 0.9657529 0.78814062 2
# 7 5404041 0.1518129 -0.8145709 0.67070383 2
# 8 254252 -1.0861326 0.2839578 -0.94918081 4
# 9 541254 1.6133728 -0.1616986 0.03613574 3
Data
dat <- structure(list(id = c(454452L, 5450441L, 5444531L, 5444531L,
5404501L, 5404041L, 5404041L, 254252L, 541254L), X1 = c(-1.10459944068306,
0.539023801893912, 0.580206320853481, -0.657502835154674, 1.55489554810057,
-1.18764140164182, 0.151812914504533, -1.08613257605253, 1.61337280035418
), X2 = c(0.0356311982051355, 1.31495884897891, 0.978167526364279,
0.881791226863203, 0.482204688262918, 0.965752878105794, -0.814570938270238,
0.283957806364306, -0.161698647607024), X3 = c(1.93557176599585,
1.72323079854894, 0.358402056802064, 0.3024309248682, -0.394114506412192,
0.788140622823556, 0.67070382675052, -0.949180809687611, 0.0361357384849679
)), class = "data.frame", row.names = c(NA, -9L))

Add Previous Row to Corresponding Column by Group in R

I will post a reproducible Example.
id <- c(1,1,1,1,2,2,1,1)
group <- c("a","b","c","d","a","b","c","d")
df <- data.frame(id, group)
I want something like this as end result.
+====+========+========+
| id | group1 | group2 |
+====+========+========+
| 1 | a | b |
+----+--------+--------+
| 1 | b | c |
+----+--------+--------+
| 1 | c | d |
+----+--------+--------+
| 1 | d | - |
+----+--------+--------+
| 2 | a | b |
+----+--------+--------+
| 2 | b | - |
+----+--------+--------+
| 1 | c | d |
+----+--------+--------+
| 1 | d | - |
+----+--------+--------+
Just to mention the order of ID's matter. I have another column as timestamp.
One solution with dplyr and rleid from data.table:
library(dplyr)
df %>%
mutate(id2 = data.table::rleid(id)) %>%
group_by(id2) %>%
mutate(group2 = lead(group))
# A tibble: 8 x 4
# Groups: id2 [3]
id group id2 group2
<dbl> <fct> <int> <fct>
1 1.00 a 1 b
2 1.00 b 1 c
3 1.00 c 1 d
4 1.00 d 1 NA
5 2.00 a 2 b
6 2.00 b 2 NA
7 1.00 c 3 d
8 1.00 d 3 NA
If I understood correct your question, you can use the following function:
id <- c(1,1,1,1,2,2,1,1)
group <- c("a","b","c","d","a","b","c","d")
df <- data.frame(id, group)
add_group2 <- function(df) {
n <-length(group)
group2 <- as.character(df$group[2:n])
group2 <- c(group2, "-")
group2[which(c(df$id[-n] - c(df$id[2:n]), 0) != 0)] <- "-"
return(data.frame(df, group2))
}
add_group2(df)
Result should be:
id group group2
1 1 a b
2 1 b c
3 1 c d
4 1 d -
5 2 a b
6 2 b -
7 1 c d
8 1 d -

R: group-wise min or max

There are so many posts on how to get the group-wise min or max with SQL. But how do you do it in R?
Let's say, you have got the following data frame
ID | t | value
a | 1 | 3
a | 2 | 5
a | 3 | 2
a | 4 | 1
a | 5 | 5
b | 2 | 2
b | 3 | 1
b | 4 | 5
For every ID, I don't want the min t, but the value at the min t.
ID | value
a | 3
b| 2
df is your data.frame -
library(data.table)
setDT(df) # convert to data.table in place
df[, value[which.min(t)], by = ID]
Output -
> df[, value[which.min(t)], by = ID]
ID V1
1: a 3
2: b 2
You are looking for tapply:
df <- read.table(textConnection("
ID | t | value
a | 1 | 3
a | 2 | 5
a | 3 | 2
a | 4 | 1
a | 5 | 5
b | 2 | 2
b | 3 | 1
b | 4 | 5"), header=TRUE, sep="|")
m <- tapply(1:nrow(df), df$ID, function(i) {
df$value[i[which.min(df$t[i])]]
})
# a b
# 3 2
Two more solutions (with sgibb's df):
sapply(split(df, df$ID), function(x) x$value[which.min(x$t)])
#a b
#3 2
library(plyr)
ddply(df, .(ID), function(x) x$value[which.min(x$t)])
# ID V1
#1 a 3
#2 b 2

Resources