Is there any R function to make this happen? - r

Hi this is an excel form of data i want to be able to create in R
Just want to make it clear, I need to be able to make the column Group_fix equal to 5 for the next 12 months period observation, every time an observation date has 5 in its Group column.
How to make it possible in R? Can we use ifelse function?

Here is an approach with lag from dplyr.
library(dplyr)
data %>%
mutate(GroupFix = case_when(Group == 5 |
lag(Group,2) == 5 |
lag(Group,2) == 5 |
lag(Group,3) == 5 |
lag(Group,4) == 5 |
lag(Group,5) == 5 |
lag(Group,6) == 5 |
lag(Group,7) == 5 |
lag(Group,8) == 5 |
lag(Group,9) == 5 |
lag(Group,10) == 5 |
lag(Group,11) == 5 ~ 5,
TRUE ~ as.numeric(Group)))
Observation.Date Group GroupFix
1 12/31/19 1 1
2 1/31/20 2 2
3 2/29/20 2 2
4 3/31/20 2 2
5 4/30/20 3 3
6 5/31/20 4 4
7 6/30/20 5 5
8 7/31/20 5 5
9 8/31/20 4 5
10 9/30/20 3 5
11 10/31/20 2 5
12 11/30/20 3 5
13 12/31/20 4 5
14 1/31/21 5 5
15 2/28/21 5 5
16 3/31/21 4 5
17 4/30/21 3 5
18 5/31/21 2 5
19 6/30/21 1 5
20 7/31/21 1 5
21 8/31/21 1 5
22 9/30/21 1 5
23 10/31/21 1 5
24 11/30/21 1 5
25 12/31/21 1 5
26 1/31/22 1 5
27 2/28/22 1 1
Data
data <- structure(list(Observation.Date = structure(c(8L, 1L, 13L, 14L,
16L, 18L, 20L, 22L, 24L, 26L, 4L, 6L, 9L, 2L, 11L, 15L, 17L,
19L, 21L, 23L, 25L, 27L, 5L, 7L, 10L, 3L, 12L), .Label = c("1/31/20",
"1/31/21", "1/31/22", "10/31/20", "10/31/21", "11/30/20", "11/30/21",
"12/31/19", "12/31/20", "12/31/21", "2/28/21", "2/28/22", "2/29/20",
"3/31/20", "3/31/21", "4/30/20", "4/30/21", "5/31/20", "5/31/21",
"6/30/20", "6/30/21", "7/31/20", "7/31/21", "8/31/20", "8/31/21",
"9/30/20", "9/30/21"), class = "factor"), Group = c(1L, 2L, 2L,
2L, 3L, 4L, 5L, 5L, 4L, 3L, 2L, 3L, 4L, 5L, 5L, 4L, 3L, 2L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L)), class = "data.frame", row.names = c(NA,
-27L))

Related

Choose rows in which the absolute value of subtraction is less a specified value

Let's say I have this dataframe:
ID X1 X2
1 1 2
2 2 1
3 3 1
4 4 1
5 5 5
6 6 20
7 7 20
8 9 20
9 10 20
dataset <- structure(list(ID = 1:9, X1 = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 9L,
10L), X2 = c(2L, 1L, 1L, 1L, 5L, 20L, 20L, 20L, 20L)),
class = "data.frame", row.names = c(NA,
-9L))
And I want to select rows in which the absolute value of the subtraction of rows are more or equal to 2 (based on columns X1 and X2).
For example, row 4 value is 4-1, which is 3 and should be selected.
Row 9 value is 10-20, which is -10. Absolute value is 10 and should be selected.
In this case it would be rows 3, 4, 6, 7, 8 and 9
I tried:
dataset2 = dataset[,abs(dataset- c(dataset[,2])) > 2]
But I get an error.
The operation:
abs(dataset- c(dataset[,2])) > 2
Does give me rows that the sum are more than 2, but the result only works for my second column and does not select properly
We can get the difference between the 'X1', 'X2' columns, create a logical expression in subset to subset the rows
subset(dataset, abs(X1 - X2) >= 2)
# ID X1 X2
#3 3 3 1
#4 4 4 1
#6 6 6 20
#7 7 7 20
#8 8 9 20
#9 9 10 20
Or using index
subset(dataset, abs(dataset[[2]] - dataset[[3]]) >= 2)
data
dataset <- structure(list(ID = 1:9, X1 = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 9L,
10L), X2 = c(2L, 1L, 1L, 1L, 5L, 20L, 20L, 20L, 20L)),
class = "data.frame", row.names = c(NA,
-9L))

Convert some rows to columns in r [duplicate]

This question already has answers here:
How to reshape data from long to wide format
(14 answers)
How to sum a variable by group
(18 answers)
Aggregate / summarize multiple variables per group (e.g. sum, mean)
(10 answers)
Closed 5 years ago.
I have a dataset like the one below:
test <- structure(list(SR = c(1L, 1L, 15L, 20L, 20L, 96L, 110L, 110L,
121L, 121L, 130L, 130L, 143L, 143L), Area = structure(c(3L, 3L,
1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 4L, 4L, 2L, 2L), .Label = c("FH",
"MO", "TSC", "WMB"), class = "factor"), Period = structure(c(1L,
2L, 1L, 1L, 2L, 1L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L), .Label = c("First",
"Second"), class = "factor"), count = c(4L, 6L, 3L, 6L, 6L, 3L,
6L, 6L, 6L, 6L, 6L, 6L, 5L, 6L), countTotal = c(10L, 10L, 3L,
12L, 12L, 3L, 12L, 12L, 12L, 12L, 12L, 12L, 11L, 11L), SumTotal = c(1520,
5769.02, 29346.78, 13316.89, 11932.68, 10173.05, 13243.5, 17131.94,
111189.07, 84123.52, 79463.1, 120010.57, 7035.88, 11520)), .Names = c("SR",
"Area", "Period", "count", "countTotal", "SumTotal"), class = "data.frame", row.names = c(NA,
-14L))
SR Area Period count countTotal SumTotal
1 TSC First 4 10 1520.00
1 TSC Second 6 10 5769.02
15 FH First 3 3 29346.78
20 FH First 6 12 13316.89
20 FH Second 6 12 11932.68
96 FH First 3 3 10173.05
110 MO First 6 12 13243.50
110 MO Second 6 12 17131.94
121 FH First 6 12 111189.07
121 FH Second 6 12 84123.52
130 WMB First 6 12 79463.10
130 WMB Second 6 12 120010.57
143 MO First 5 11 7035.88
143 MO Second 6 11 11520.00
I want to convert some of the rows to columns to make the dataset look like this:
SR Area countTotal First.Count Second.Count First.SumTotal Second.SumTotal
1 TSC 10 4 6 1520.00 5769.02
15 FH 3 3 NA 29346.78 NA
20 FH 12 6 6 13316.89 11932.68
96 FH 3 3 NA 10173.05 NA
110 MO 12 6 6 13243.50 17131.94
121 FH 12 6 6 111189.07 84123.52
130 WMB 12 6 6 79463.10 120010.57
143 MO 11 5 6 7035.88 11520.00
I was trying to use spread from tidyr with this code
test %>% spread(Period, SumTotal) but I still get two lines for each SR and Area.
Can someone help?
You need to first gather by the columns you want to spread, and combine the Period column with the variable column, then spread the resulting variable column:
library(dplyr)
library(tidyr)
test %>%
gather(variable, value, count:SumTotal) %>%
unite("variable", Period, variable, sep = ".") %>%
spread(variable, value)
Result:
SR Area First.count First.countTotal First.SumTotal Second.count Second.countTotal
1 1 TSC 4 10 1520.00 6 10
2 15 FH 3 3 29346.78 NA NA
3 20 FH 6 12 13316.89 6 12
4 96 FH 3 3 10173.05 NA NA
5 110 MO 6 12 13243.50 6 12
6 121 FH 6 12 111189.07 6 12
7 130 WMB 6 12 79463.10 6 12
8 143 MO 5 11 7035.88 6 11
Second.SumTotal
1 5769.02
2 NA
3 11932.68
4 NA
5 17131.94
6 84123.52
7 120010.57
8 11520.00

R spread vs gather in tidyr

I have a dataframe in the following form:
person currentTest beforeValue afterValue
1 1 A 1.284297055 2.671763513
2 2 A -0.618359548 -2.354926905
3 3 A 0.039457430 -0.091709968
4 4 A -0.448608324 -0.362851832
5 5 A -0.961777124 -1.416284339
6 6 A 0.702471895 2.052181444
7 7 A -0.455222045 -2.125684279
8 8 A -1.231549132 -2.777425148
9 9 A -0.797234990 -0.558306183
10 10 A -0.709734963 -1.244159550
11 1 B -0.472799377 -0.869472343
12 2 B 0.059720737 1.444855389
13 3 B 0.924201532 2.731049485
14 4 B 0.658884183 1.017542475
15 5 B -1.989807256 -4.712671740
16 6 B 0.660241305 1.971232718
17 7 B 0.089636952 -0.564457911
18 8 B -0.828399941 0.507659171
19 9 B -0.838074237 -0.316996942
20 10 B -1.659197101 -3.317623686
...
What I'd like is to get a data frame of:
person A_Before A_After B_Before, B_After, ...
1 1.284297055 2.671763513 -0.472799377 -0.869472343
2 -0.618359548 -2.354926905 0.059720737 1.444855389
...
I've tried gather and spread but that's not quite what I need as there's the creation of new columns. Any suggestions?
The dput version for easy access is below:
resultsData <- dput(resultsData)
structure(list(person = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L,
10L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 1L, 2L, 3L, 4L,
5L, 6L, 7L, 8L, 9L, 10L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L,
10L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 1L, 2L, 3L, 4L,
5L, 6L, 7L, 8L, 9L, 10L), currentTest = structure(c(1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L,
6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L), .Label = c("A", "B", "C",
"D", "E", "F"), class = "factor"), beforeValue = c(1.28429705541541,
-0.618359548370402, 0.039457429902531, -0.448608324038257, -0.961777123997687,
0.702471895259405, -0.455222044740939, -1.23154913153736, -0.797234989892673,
-0.709734963076803, -0.47279937661921, 0.0597207367403981, 0.924201531911827,
0.658884182599422, -1.98980725637449, 0.660241304554785, 0.0896369516528346,
-0.828399941497236, -0.838074236572976, -1.65919710134782, 0.577469369909437,
1.92748171699512, -0.245593641496638, 0.126104785456265, -0.559338325961641,
1.29802115505785, 0.719406692531958, 0.969414499181256, -0.814697072724845,
0.86465983690719, -0.709539159817187, 1.02775240926492, -0.50490096148732,
0.40769259465753, -0.868531009656408, 0.949518511358715, 2.32458579520932,
-0.257578702370506, -0.789761851618986, 0.0979274657020477, -0.00803566278013502,
1.42984177159549, 1.45485678109231, -0.956556613290905, 0.443323691839299,
-0.261951072972966, -1.30990441429799, 0.0921741874883992, -1.02612779569131,
0.81550719514697, -0.403037731404182, -0.384422139459082, 0.417074857491798,
-1.37128032791855, -0.0796160137501127, 1.35302483988882, -0.752751140138746,
0.812453275384099, -1.32443072805549, -1.66986584340583), afterValue = c(2.67176351335094,
-2.35492690509713, -0.0917099675669388, -0.362851831626841, -1.4162843393352,
2.05218144382074, -2.12568427901904, -2.77742514848958, -0.558306182843248,
-1.24415954975022, -0.869472343362331, 1.44485538931333, 2.73104948477609,
1.01754247530805, -4.71267174035743, 1.9712327179732, -0.564457911016569,
0.507659170771878, -0.31699694238194, -3.31762368638082, 1.09068172988414,
4.37537723545199, -0.116850493406969, 1.9533832597394, -1.69003563933244,
2.62250581307257, -0.00837379068728961, 1.84192937988371, -0.675899868505659,
2.08506660046288, -0.583526785879512, 0.699298693972492, -1.26172199141024,
1.23589313451783, -1.56008919968504, 0.436686458587792, 0.11699090169902,
-1.07206510594109, 1.21204947218164, -0.812406581646911, 0.50373332256566,
-0.084945367568491, -0.236015748624917, -0.479606239480476, -0.596799139055039,
-0.562575023441403, -0.339935276865152, -0.213813544612318, -0.265296303857373,
-1.12545083569158, 0.0105156062602101, 0.635695183644557, 0.767433440961415,
0.16648012185356, 0.544633089427927, -0.904001384160196, -0.429299134808951,
0.764224744168297, -0.166062348771635, -0.101892580202475)), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -60L), .Names = c("person",
"currentTest", "beforeValue", "afterValue"))
We can use dcast from reshape2
library(reshape2)
meltdf <- melt(resultsData, id.vars=1:2)
dcast(meltdf, person ~ currentTest + variable)
> dcast(meltdf, person ~ currentTest + variable)
person A_beforeValue A_afterValue B_beforeValue B_afterValue C_beforeValue C_afterValue D_beforeValue D_afterValue E_beforeValue
1 1 1.28429706 2.67176351 -0.47279938 -0.8694723 0.5774694 1.090681730 -0.70953916 -0.5835268 -0.008035663
2 2 -0.61835955 -2.35492691 0.05972074 1.4448554 1.9274817 4.375377235 1.02775241 0.6992987 1.429841772
3 3 0.03945743 -0.09170997 0.92420153 2.7310495 -0.2455936 -0.116850493 -0.50490096 -1.2617220 1.454856781
4 4 -0.44860832 -0.36285183 0.65888418 1.0175425 0.1261048 1.953383260 0.40769259 1.2358931 -0.956556613
5 5 -0.96177712 -1.41628434 -1.98980726 -4.7126717 -0.5593383 -1.690035639 -0.86853101 -1.5600892 0.443323692
6 6 0.70247190 2.05218144 0.66024130 1.9712327 1.2980212 2.622505813 0.94951851 0.4366865 -0.261951073
7 7 -0.45522204 -2.12568428 0.08963695 -0.5644579 0.7194067 -0.008373791 2.32458580 0.1169909 -1.309904414
8 8 -1.23154913 -2.77742515 -0.82839994 0.5076592 0.9694145 1.841929380 -0.25757870 -1.0720651 0.092174187
9 9 -0.79723499 -0.55830618 -0.83807424 -0.3169969 -0.8146971 -0.675899869 -0.78976185 1.2120495 -1.026127796
10 10 -0.70973496 -1.24415955 -1.65919710 -3.3176237 0.8646598 2.085066600 0.09792747 -0.8124066 0.815507195
E_afterValue F_beforeValue F_afterValue
1 0.50373332 -0.40303773 0.01051561
2 -0.08494537 -0.38442214 0.63569518
3 -0.23601575 0.41707486 0.76743344
4 -0.47960624 -1.37128033 0.16648012
5 -0.59679914 -0.07961601 0.54463309
6 -0.56257502 1.35302484 -0.90400138
7 -0.33993528 -0.75275114 -0.42929913
8 -0.21381354 0.81245328 0.76422474
9 -0.26529630 -1.32443073 -0.16606235
10 -1.12545084 -1.66986584 -0.10189258
You can use a combined gather + spread approach; Gather the *Values columns and combine with currentTest to form the new header, then spread to wide format:
resultsData %>%
gather(key, value, -person, -currentTest) %>%
unite(header, c('currentTest', 'key'), sep = "_") %>%
spread(header, value)
# A tibble: 10 x 13
# person A_afterValue A_beforeValue B_afterValue B_beforeValue C_afterValue C_beforeValue
# * <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
# 1 1 2.67176351 1.28429706 -0.8694723 -0.47279938 1.090681730 0.5774694
# 2 2 -2.35492691 -0.61835955 1.4448554 0.05972074 4.375377235 1.9274817
# 3 3 -0.09170997 0.03945743 2.7310495 0.92420153 -0.116850493 -0.2455936
# 4 4 -0.36285183 -0.44860832 1.0175425 0.65888418 1.953383260 0.1261048
# 5 5 -1.41628434 -0.96177712 -4.7126717 -1.98980726 -1.690035639 -0.5593383
# 6 6 2.05218144 0.70247190 1.9712327 0.66024130 2.622505813 1.2980212
# 7 7 -2.12568428 -0.45522204 -0.5644579 0.08963695 -0.008373791 0.7194067
# 8 8 -2.77742515 -1.23154913 0.5076592 -0.82839994 1.841929380 0.9694145
# 9 9 -0.55830618 -0.79723499 -0.3169969 -0.83807424 -0.675899869 -0.8146971
#10 10 -1.24415955 -0.70973496 -3.3176237 -1.65919710 2.085066600 0.8646598
# ... with 6 more variables: D_afterValue <dbl>, D_beforeValue <dbl>, E_afterValue <dbl>,
# E_beforeValue <dbl>, F_afterValue <dbl>, F_beforeValue <dbl>
If you need to rename the columns:
resultsData %>%
gather(key, value, -person, -currentTest) %>%
unite(header, c('currentTest', 'key'), sep = "_") %>%
spread(header, value) %>%
rename_at(vars(matches("Value$")), funs(gsub("Value$", "", .)))
We could do this in a single line using recast
reshape2::recast(resultsData, person ~currentTest + variable, id.var = 1:2)
#person A_beforeValue A_afterValue B_beforeValue B_afterValue C_beforeValue C_afterValue D_beforeValue D_afterValue
#1 1 1.28429706 2.67176351 -0.47279938 -0.8694723 0.5774694 1.090681730 -0.70953916 -0.5835268
#2 2 -0.61835955 -2.35492691 0.05972074 1.4448554 1.9274817 4.375377235 1.02775241 0.6992987
#3 3 0.03945743 -0.09170997 0.92420153 2.7310495 -0.2455936 -0.116850493 -0.50490096 -1.2617220
#4 4 -0.44860832 -0.36285183 0.65888418 1.0175425 0.1261048 1.953383260 0.40769259 1.2358931
#5 5 -0.96177712 -1.41628434 -1.98980726 -4.7126717 -0.5593383 -1.690035639 -0.86853101 -1.5600892
#6 6 0.70247190 2.05218144 0.66024130 1.9712327 1.2980212 2.622505813 0.94951851 0.4366865
#7 7 -0.45522204 -2.12568428 0.08963695 -0.5644579 0.7194067 -0.008373791 2.32458580 0.1169909
#8 8 -1.23154913 -2.77742515 -0.82839994 0.5076592 0.9694145 1.841929380 -0.25757870 -1.0720651
#9 9 -0.79723499 -0.55830618 -0.83807424 -0.3169969 -0.8146971 -0.675899869 -0.78976185 1.2120495
#10 10 -0.70973496 -1.24415955 -1.65919710 -3.3176237 0.8646598 2.085066600 0.09792747 -0.8124066
# E_beforeValue E_afterValue F_beforeValue F_afterValue
#1 -0.008035663 0.50373332 -0.40303773 0.01051561
#2 1.429841772 -0.08494537 -0.38442214 0.63569518
#3 1.454856781 -0.23601575 0.41707486 0.76743344
#4 -0.956556613 -0.47960624 -1.37128033 0.16648012
#5 0.443323692 -0.59679914 -0.07961601 0.54463309
#6 -0.261951073 -0.56257502 1.35302484 -0.90400138
#7 -1.309904414 -0.33993528 -0.75275114 -0.42929913
#8 0.092174187 -0.21381354 0.81245328 0.76422474
#9 -1.026127796 -0.26529630 -1.32443073 -0.16606235
#10 0.815507195 -1.12545084 -1.66986584 -0.10189258

Tidy data.frame with repeated column names

I have a program that gives me data in this format
toy
file_path Condition Trial.Num A B C ID A B C ID A B C ID
1 root/some.extension Baseline 1 2 3 5 car 2 1 7 bike 4 9 0 plane
2 root/thing.extension Baseline 2 3 6 45 car 5 4 4 bike 9 5 4 plane
3 root/else.extension Baseline 3 4 4 6 car 7 5 4 bike 68 7 56 plane
4 root/uniquely.extension Treatment 1 5 3 7 car 1 7 37 bike 9 8 7 plane
5 root/defined.extension Treatment 2 6 7 3 car 4 6 8 bike 9 0 8 plane
My goal is to tidy the format into something that at least can be easier to finally tidy with reshape having unique column names
tidy_toy
file_path Condition Trial.Num A B C ID
1 root/some.extension Baseline 1 2 3 5 car
2 root/thing.extension Baseline 2 3 6 45 car
3 root/else.extension Baseline 3 4 4 6 car
4 root/uniquely.extension Treatment 1 5 3 7 car
5 root/defined.extension Treatment 2 6 7 3 car
6 root/some.extension Baseline 1 2 1 7 bike
7 root/thing.extension Baseline 2 5 4 4 bike
8 root/else.extension Baseline 3 7 5 4 bike
9 root/uniquely.extension Treatment 1 1 7 37 bike
10 root/defined.extension Treatment 2 4 6 8 bike
11 root/some.extension Baseline 1 4 9 0 plane
12 root/thing.extension Baseline 2 9 5 4 plane
13 root/else.extension Baseline 3 68 7 56 plane
14 root/uniquely.extension Treatment 1 9 8 7 plane
15 root/defined.extension Treatment 2 9 0 8 plane
If I try to melt from toy it doesn't work because only the first ID column will get used for id.vars (hence everything will get tagged as cars). Identical variables will get dropped.
Here's the dput of both tables
structure(list(file_path = structure(c(3L, 4L, 2L, 5L, 1L), .Label = c("root/defined.extension",
"root/else.extension", "root/some.extension", "root/thing.extension",
"root/uniquely.extension"), class = "factor"), Condition = structure(c(1L,
1L, 1L, 2L, 2L), .Label = c("Baseline", "Treatment"), class = "factor"),
Trial.Num = c(1L, 2L, 3L, 1L, 2L), A = 2:6, B = c(3L, 6L,
4L, 3L, 7L), C = c(5L, 45L, 6L, 7L, 3L), ID = structure(c(1L,
1L, 1L, 1L, 1L), .Label = "car", class = "factor"), A = c(2L,
5L, 7L, 1L, 4L), B = c(1L, 4L, 5L, 7L, 6L), C = c(7L, 4L,
4L, 37L, 8L), ID = structure(c(1L, 1L, 1L, 1L, 1L), .Label = "bike", class = "factor"),
A = c(4L, 9L, 68L, 9L, 9L), B = c(9L, 5L, 7L, 8L, 0L), C = c(0L,
4L, 56L, 7L, 8L), ID = structure(c(1L, 1L, 1L, 1L, 1L), .Label = "plane", class = "factor")), .Names = c("file_path",
"Condition", "Trial.Num", "A", "B", "C", "ID", "A", "B", "C",
"ID", "A", "B", "C", "ID"), class = "data.frame", row.names = c(NA,
-5L))
structure(list(file_path = structure(c(3L, 4L, 2L, 5L, 1L, 3L,
4L, 2L, 5L, 1L, 3L, 4L, 2L, 5L, 1L), .Label = c("root/defined.extension",
"root/else.extension", "root/some.extension", "root/thing.extension",
"root/uniquely.extension"), class = "factor"), Condition = structure(c(1L,
1L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 2L, 2L), .Label = c("Baseline",
"Treatment"), class = "factor"), Trial.Num = c(1L, 2L, 3L, 1L,
2L, 1L, 2L, 3L, 1L, 2L, 1L, 2L, 3L, 1L, 2L), A = c(2L, 3L, 4L,
5L, 6L, 2L, 5L, 7L, 1L, 4L, 4L, 9L, 68L, 9L, 9L), B = c(3L, 6L,
4L, 3L, 7L, 1L, 4L, 5L, 7L, 6L, 9L, 5L, 7L, 8L, 0L), C = c(5L,
45L, 6L, 7L, 3L, 7L, 4L, 4L, 37L, 8L, 0L, 4L, 56L, 7L, 8L), ID = structure(c(2L,
2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 3L, 3L, 3L, 3L, 3L), .Label = c("bike",
"car", "plane"), class = "factor")), .Names = c("file_path",
"Condition", "Trial.Num", "A", "B", "C", "ID"), class = "data.frame", row.names = c(NA,
-15L))
You can use the make.unique-function to create unique column names. After that you can use melt from the data.table-package which is able to create multiple value-columns based on patterns in the columnnames:
# make the column names unique
names(toy) <- make.unique(names(toy))
# let the 'Condition' column start with a small letter 'c'
# so it won't be detected by the patterns argument from melt
names(toy)[2] <- tolower(names(toy)[2])
# load the 'data.table' package
library(data.table)
# tidy the data into long format
tidy_toy <- melt(setDT(toy),
measure.vars = patterns('^A','^B','^C','^ID'),
value.name = c('A','B','C','ID'))
which gives:
> tidy_toy
file_path condition Trial.Num variable A B C ID
1: root/some.extension Baseline 1 1 2 3 5 car
2: root/thing.extension Baseline 2 1 3 6 45 car
3: root/else.extension Baseline 3 1 4 4 6 car
4: root/uniquely.extension Treatment 1 1 5 3 7 car
5: root/defined.extension Treatment 2 1 6 7 3 car
6: root/some.extension Baseline 1 2 2 1 7 bike
7: root/thing.extension Baseline 2 2 5 4 4 bike
8: root/else.extension Baseline 3 2 7 5 4 bike
9: root/uniquely.extension Treatment 1 2 1 7 37 bike
10: root/defined.extension Treatment 2 2 4 6 8 bike
11: root/some.extension Baseline 1 3 4 9 0 plane
12: root/thing.extension Baseline 2 3 9 5 4 plane
13: root/else.extension Baseline 3 3 68 7 56 plane
14: root/uniquely.extension Treatment 1 3 9 8 7 plane
15: root/defined.extension Treatment 2 3 9 0 8 plane
Another option is to use a list of column-indexes for measure.vars:
tidy_toy <- melt(setDT(toy),
measure.vars = list(c(4,8,12), c(5,9,13), c(6,10,14), c(7,11,15)),
value.name = c('A','B','C','ID'))
Making the column-names unique isn't necessary then.
A more complicated method that creates names that are better distinguishable by the patterns argument:
# select the names that are not unique
tt <- table(names(toy))
idx <- which(names(toy) %in% names(tt)[tt > 1])
nms <- names(toy)[idx]
# make them unique
names(toy)[idx] <- paste(nms,
rep(seq(length(nms) / length(names(tt)[tt > 1])),
each = length(names(tt)[tt > 1])),
sep = '.')
# your columnnames are now unique:
> names(toy)
[1] "file_path" "Condition" "Trial.Num" "A.1" "B.1" "C.1" "ID.1" "A.2"
[9] "B.2" "C.2" "ID.2" "A.3" "B.3" "C.3" "ID.3"
# tidy the data into long format
tidy_toy <- melt(setDT(toy),
measure.vars = patterns('^A.\\d','^B.\\d','^C.\\d','^ID.\\d'),
value.name = c('A','B','C','ID'))
which will give the same end-result.
As mentioned in the comments, the janitor-package can be helpful for this problem as well. The clean_names() works similar as the make.unique function. See here for an explanation.
with tidyverse we can do :
library(tidyverse)
toy %>%
repair_names(sep="_") %>%
pivot_longer(-(1:3),names_to = c(".value","id"), names_sep="_") %>%
select(-id)
#> # A tibble: 15 x 7
#> file_path Condition Trial.Num A B C ID
#> <fct> <fct> <int> <int> <int> <int> <fct>
#> 1 root/some.extension Baseline 1 2 3 5 car
#> 2 root/some.extension Baseline 1 2 1 7 bike
#> 3 root/some.extension Baseline 1 4 9 0 plane
#> 4 root/thing.extension Baseline 2 3 6 45 car
#> 5 root/thing.extension Baseline 2 5 4 4 bike
#> 6 root/thing.extension Baseline 2 9 5 4 plane
#> 7 root/else.extension Baseline 3 4 4 6 car
#> 8 root/else.extension Baseline 3 7 5 4 bike
#> 9 root/else.extension Baseline 3 68 7 56 plane
#> 10 root/uniquely.extension Treatment 1 5 3 7 car
#> 11 root/uniquely.extension Treatment 1 1 7 37 bike
#> 12 root/uniquely.extension Treatment 1 9 8 7 plane
#> 13 root/defined.extension Treatment 2 6 7 3 car
#> 14 root/defined.extension Treatment 2 4 6 8 bike
#> 15 root/defined.extension Treatment 2 9 0 8 plane
#> Warning message:
#> Expected 2 pieces. Missing pieces filled with `NA` in 4 rows [1, 2, 3, 4].

R : ddply and return string

I have a dataframe like this:
id col1
1 1 1
2 2 2
3 3 3
4 4 4
5 5 1
6 1 2
7 2 3
8 3 4
I would like to group by id's then create a string that contains the values in col1 separated by a space and in descending value.
I first order the data frame by id and col1 but am unable to get the output from ddply as a string with no quotes.
df111 <- df111[order(df111$id, -df111$col1),]
df222 <- ddply(df111, .(id), function(col1) as.character(paste0(col1,sep = ' ')))
id V1 V2
1 1 c(1, 1, 1, 1) c(0.793507214868441, 0.539258575299755, 0.165128685068339, 0.153290810529143)
2 2 c(2, 2, 2, 2) c(0.872032727580518, 0.827515688957646, 0.236087603960186, 0.165240615839139)
3 3 c(3, 3, 3, 3) c(0.759382889838889, 0.484359077410772, 0.182580581633374, 0.0723447729833424)
4 4 c(4, 4, 4, 4) c(0.874859027564526, 0.642130059422925, 0.0569298807531595, 0.0227038362063468)
5 5 c(5, 5, 5, 5) c(0.392553070792928, 0.386064056074247, 0.299609177513048, 0.222290486795828)
I'd like some thing like this:
id V1
1 1 .793507214868441 0.539258575299755 0.165128685068339 0.153290810529143
Any suggestions?
EDIT:
> dput(df111)
structure(list(id = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L,
3L, 3L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L), col1 = c(0.793507214868441,
0.539258575299755, 0.165128685068339, 0.153290810529143, 0.872032727580518,
0.827515688957646, 0.236087603960186, 0.165240615839139, 0.759382889838889,
0.484359077410772, 0.182580581633374, 0.0723447729833424, 0.874859027564526,
0.642130059422925, 0.0569298807531595, 0.0227038362063468, 0.392553070792928,
0.386064056074247, 0.299609177513048, 0.222290486795828)), .Names = c("id",
"col1"), row.names = c(1L, 11L, 16L, 6L, 7L, 12L, 17L, 2L, 18L,
13L, 8L, 3L, 14L, 9L, 19L, 4L, 20L, 10L, 5L, 15L), class = "data.frame")
I think maybe you just need to use summarise rather than a custom anonymous function...?
dat <- read.table(text = "id col1
1 1 1
2 2 2
3 3 3
4 4 4
5 5 1
6 1 2
7 2 3
8 3 4",header = TRUE,sep = "")
> ddply(dat,.(id),summarise,val = paste(sort(col1,decreasing = TRUE),collapse = " "))
id val
1 1 2 1
2 2 3 2
3 3 4 3
4 4 4
5 5 1

Resources