Creating row in dataframe for each element in vector - r

I have a vector of numbers:
a <- c(54, 456, 23432, 4868, 34, 245634, 37, 46453, 1342354)
In my already-existent dataframe (head included via dput below), I would like to create a new variable. Each row of the new variable will contain a single element from the vector. So there would be one value (e.g. 54) in each row of the new variable.
structure(list(Phone = structure(c(1L,
1L, 1L, 1L, 1L, 1L), .Label = "a", class = "factor"), Frame = structure(c(1L,
3L, 2L, 4L, 6L, 5L), .Label = c("[-4.46225397 -4.14727267 -4.45203785 -4.67251549 -5.13750066 -4.92839463\n -5.03957588 -5.68530479]",
"[-6.14532579 -4.38918589 -4.12275354 -4.19263549 -4.30380823 -4.35621995\n -4.4079389 -4.47339504]",
"[-6.43104195 -4.75506178 -4.2324676 -4.21878988 -4.1635973 -4.11186806\n -4.05023489 -4.08204198]",
"[-7.1528423 -5.46190925 -5.94873845 -6.635839 -6.84179002 -6.85955335\n -6.83714326 -6.87621415]",
"[-7.23901353 -4.61522546 -3.25206619 -3.38407075 -3.63762837 -3.85352927\n -3.94250123 -4.04015791]",
"[-7.34451319 -5.58664694 -4.69929752 -4.621823 -4.51670576 -4.48494125\n -4.39512713 -4.26553646]"
), class = "factor"), Previous = structure(c(1L, 1L, 1L, 1L,
1L, 1L), .Label = "ch", class = "factor"), Following = structure(c(1L,
1L, 1L, 1L, 1L, 1L), .Label = "p", class = "factor"), Word = structure(c(1L,
1L, 1L, 1L, 1L, 1L), .Label = "juk'ucha-pi", class = "factor"),
Note = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "", class = "factor"),
"[-10.79197258 -7.97949955 -7.10253093 -7.07957825 -6.98695923\n -6.90015207 -6.79672506 -6.85010073",
"[-10.31251047 -7.36552088 -6.91841906 -7.0356884 -7.2222481\n -7.31020053 -7.39699043 -7.5068328 ",
"[-12.00323036 -9.16566481 -9.982616 -11.13564383 -11.48125155\n -11.51106031 -11.47345379 -11.5390189 ",
"[-12.32487451 -9.37498793 -7.8859212 -7.7559107 -7.5795128\n -7.52620857 -7.37549093 -7.15802398",
"[-12.14783486 -7.74483933 -5.45731306 -5.67883075 -6.10432742\n -6.46663209 -6.61593651 -6.77981481"
), Morph_status = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "", class = "factor"),
row.names = c(NA, 6L), class = "data.frame")

When working with data frames, each variable (column) has as many entries as there are rows. What you are describing then is not a data frame and, if I understand you question correctly, the best your can do is going back to general lists:
df <- data.frame(a = 1:3, b = 1:3)
c(as.list(df), c = list(a))
# $a
# [1] 1 2 3
#
# $b
# [1] 1 2 3
#
# $c
# [1] 54 456 23432 4868 34 245634 37 46453 1342354
One other option, as to still have a data frame, would be to fill all the shorter columns with NA's:
library(rowr)
cbind.fill(df, a, fill = NA)
# a b object
# 1 1 1 54
# 2 2 2 456
# 3 3 3 23432
# 4 NA NA 4868
# 5 NA NA 34
# 6 NA NA 245634
# 7 NA NA 37
# 8 NA NA 46453
# 9 NA NA 1342354

Related

How to add column reporting sum of couple of subsequent rows

I have the following dataset
structure(list(Var1 = structure(c(1L, 2L, 1L, 2L, 1L, 2L, 1L,
2L), .Label = c("0", "1"), class = "factor"), Var2 = structure(c(1L,
1L, 2L, 2L, 1L, 1L, 2L, 2L), .Label = c("congruent", "incongruent"
), class = "factor"), Var3 = structure(c(1L, 1L, 1L, 1L, 2L,
2L, 2L, 2L), .Label = c("spoken", "written"), class = "factor"),
Freq = c(8L, 2L, 10L, 2L, 10L, 2L, 10L, 2L)), class = "data.frame", row.names = c(NA,
-8L))
I would like to add another column reporting sum of coupled subsequent rows. Thus the final result would look like this:
I have proceeded like this
Table = as.data.frame(table(data_1$unimodal,data_1$cong_cond, data_1$presentation_mode)) %>%
mutate(Var1 = factor(Var1, levels = c('0', '1')))
row = Table %>% #is.factor(Table$Var1)
summarise(across(where(is.numeric),
~ .[Var1 == '0'] + .[Var1 == '1'],
.names = "{.col}_sum"))
column = c(rbind(row$Freq_sum,rep(NA, 4)))
Table$column = column
But I am looking for the quickest way possible with no scripting separated codes. Here I have used the dplyr package, but if you might know possibly suggest some other ways with map(), for loop, and or the method you deem as the best, please just let me know.
This should do:
df$column <-
rep(colSums(matrix(df$Freq, 2)), each=2) * c(1, NA)
If you are fine with no NAs in the dataframe, you can
df %>%
group_by(Var2, Var3) %>%
mutate(column = sum(Freq))
# A tibble: 8 × 5
# Groups: Var2, Var3 [4]
Var1 Var2 Var3 Freq column
<fct> <fct> <fct> <int> <int>
1 0 congruent spoken 8 10
2 1 congruent spoken 2 10
3 0 incongruent spoken 10 12
4 1 incongruent spoken 2 12
5 0 congruent written 10 12
6 1 congruent written 2 12
7 0 incongruent written 10 12
8 1 incongruent written 2 12

count number of times string appears in a column

Can you think about an intuitive way of calculating the number of times the word space appears in a certain column? Or any other solution that is viable.
I basically want to know how many times the space key was pressed, however some participants made the mistake and pressed other keys which would also be considered a mistake. So I was wondering if I should go with the "key_resp.rt" column instead and count the number of response times instead. If you had any idea of how to do both it would be great as I may need to use both.
I used the following code but the results do not conform to the data.
Data %>% group_by(Participant, Session) %>% summarise(false_start = sum(str_count(key_resp.keys, "space")))
Here is a snippet of my data:
Participant RT Session key_resp.keys key_resp.rt
X 0.431265 1 ["space"] [2.3173399999941466]
X 0.217685 1
X 0.317435 2 ["space","space"] [0.6671900000001187,2.032510000000002] 2020.1.3 4
Y 0.252515 1
Y 0.05127 2 ["space","space","space","space","space","space","space","space","space"] [4.917419999999765,6.151149999999689,6.333714999999771,6.638249999999971,6.833514999999338,7.0362499999992,7.217724999999504,7.38576999999988,7.66913999999997]
dput(droplevels(head(Data_PVT)))
structure(list(Interval_stimulus = c(4.157783411, 4.876139922,
5.67011868, 9.338167417, 9.196342656, 7.62448411), Participant = structure(c(1L,
1L, 1L, 1L, 1L, 1L), .Label = "ADH80254", class = "factor"),
RT = c(431.265, 277.99, 253.515, 310.53, 299.165, 539.46),
Session = c(1L, 1L, 1L, 1L, 1L, 1L), date = structure(c(1L,
1L, 1L, 1L, 1L, 1L), .Label = "2020-06-12_11h11.47.141", class = "factor"),
key_resp.keys = structure(c(2L, 1L, 1L, 1L, 1L, 1L), .Label = c("",
"[\"space\"]"), class = "factor"), key_resp.rt = structure(c(2L,
1L, 1L, 1L, 1L, 1L), .Label = c("", "[2.3173399999941466]"
), class = "factor"), psychopyVersion = structure(c(1L, 1L,
1L, 1L, 1L, 1L), .Label = "2020.1.3", class = "factor"),
Trials = 0:5, Reciprocal = c(2.31875992719094, 3.59725169970143,
3.94453977082224, 3.22030077609249, 3.3426370063343, 1.85370555740926
)), row.names = c(NA, 6L), class = "data.frame")
Expected output:
Participant Session false_start
x 1 0
x 2 1
y 1 2
y 2 1
z 1 10
z 2 3
We can use str_count to count "space" values for each Participant and Session and sum them to get total. For all_false_start we count number of words in it.
library(dplyr)
library(stringr)
df %>%
group_by(Participant, Session) %>%
summarise(false_start = sum(str_count(key_resp.keys, '\\bspace\\b')),
all_false_start = sum(str_count(key_resp.keys, '\\b\\w+\\b')))

Issues with pivot_wider and unique identifiers because of duplicate values

I'm trying to use pivot_wider move my dataset from long to wide so I can use it in a different programme.
I have seen the other posts on this topic but the solutions don't address my problem.
I have measurement variable called "rating" which has a value for each "rock" and each test ("gentest", first and second). I have an id variable called "turkcode".
For each individual in the dataset, there are 18 ratings. The problem is that there are 4 ratings for rock #8 and I think this is why the data won't pivot wider the way I want them to.
Here's a subset of the data
structure(list(turkcode = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L), .Label = c("100879",
"104655", "108505", "110324", "110600", "112445", "114083", "115814",
"116573", "117411", "117817", "118651", "119324", "121548", "121883",
"121918", "123275", "123718", "125491", "127450", "127825", "128062",
"129061", "131404", "135358", "135594", "135671", "135945", "137951",
"138675", "139469", "140924", "145730", "147222", "148533", "150851",
"153455", "158882", "164468", "166907", "169260", "171463", "172398",
"175565", "177108", "179000", "180270", "183953", "185574", "185880",
"185948", "186371", "187787", "189220", "190014", "192550", "193904",
"195308", "196755", "197493", "198368", "200155", "200297", "201915",
"214519", "215994", "217903", "218771", "219302", "220434", "222740",
"223223", "224721", "225118", "225223", "229856", "229874", "231301",
"232576", "233842", "234215", "237581", "239567", "240609", "241098",
"241423", "242108", "244633", "246055", "251597", "252929", "255252",
"256652", "259936", "274962", "277053", "279422", "280317", "282602",
"283750", "285737", "286259", "287544", "288507", "290503", "291401",
"291835", "292160", "294117", "297863", "298061", "299347", "299499",
"301399", "304875", "305231", "306312", "307410", "308979", "311157",
"311524", "311630", "318956", "318988", "319995", "321405", "324288",
"327086", "327559", "328345", "328401", "330318", "330909", "332723",
"334115", "334517", "335811", "335831", "337145", "338323", "338542",
"338575", "340083", "341182", "343612", "343947", "344554", "346476",
"349874", "350117", "350433", "350972", "351187", "355311", "356717",
"359366", "360048", "360058", "361191", "361971", "362827", "363543",
"367244", "374254", "374965", "376278", "377622", "382139", "382916",
"384586", "385229", "386782", "388951", "389029", "390299", "390662",
"396335", "396732", "398076", "398573", "399276", "399587", "403388",
"406073", "406160", "411977", "412935", "417350", "420060", "421393",
"422944", "424462", "427143", "429291", "430758", "431629", "431638",
"431935", "432218", "433788", "434291", "436681", "437087", "439385",
"439499", "440477", "440834", "441253", "441876", "443826", "444080",
"447597", "452643", "454649", "457055", "457946", "463512", "464079",
"464123", "467897", "468650", "470211", "471115", "471512", "475493",
"476937", "479198", "482871", "484066", "484070", "485462", "486402",
"491701", "491835", "499644", "501833", "502335", "502373", "504800",
"507439", "507946", "507987", "509066", "513078", "515519", "517017",
"517988", "519144", "519210", "519858", "522847", "523683", "525315",
"528577", "532463", "532630", "533028", "539033", "539852", "540690",
"546773", "546916", "549652", "551599", "554198", "556066", "559920",
"560804", "560857", "562080", "562420", "563841", "565668", "565776",
"566509", "569039", "572553", "575364", "576421", "576694", "576877",
"577120", "577155", "577534", "577605", "578463", "578820", "578995",
"580213", "581893", "582433", "582905", "583887", "584569", "585314",
"585566", "587393", "589144", "592284", "594463", "596863", "601837",
"602632", "604254", "605885", "609296", "609963", "610062", "612437",
"612949", "613161", "614372", "614777", "615372", "615384", "616927",
"618118", "620041", "620336", "621634", "622289", "624098", "626163",
"626612", "627019", "627856", "630003", "630255", "634018", "634478",
"635801", "638606", "640012", "641078", "641366", "641436", "641821",
"642076", "642446", "643329", "643942", "644015", "646792", "647254",
"647700", "649516", "650792", "650810", "651229", "652387", "652671",
"654778", "657964", "658894", "660500", "660607", "664469", "666754",
"666796", "668996", "669712", "671682", "673516", "675712", "677835",
"678008", "679262", "680295", "686455", "690471", "691175", "692489",
"694023", "696001", "698716", "700133", "700641", "707812", "707953",
"708010", "708881", "713657", "715255", "715386", "716764", "718936",
"719956", "725348", "727753", "728436", "729588", "730513", "731928",
"732013", "732438", "733366", "733559", "734672", "735174", "735675",
"737044", "737127", "741264", "745262", "748173", "748414", "748943",
"749221", "749963", "750363", "753518", "754512", "754970", "758639",
"760838", "761642", "766250", "770646", "772574", "773054", "775271",
"776762", "778208", "779453", "781378", "781861", "782257", "785763",
"785860", "787011", "790280", "791735", "791903", "792178", "796650",
"796822", "796970", "798621", "802731", "804701", "805606", "807848",
"809142", "810539", "812182", "812321", "814029", "814545", "814774",
"815079", "816572", "824215", "825063", "827763", "829973", "829983",
"830126", "832112", "832666", "833066", "834756", "835270", "835340",
"837413", "837746", "839882", "846097", "847975", "848746", "851745",
"851975", "856622", "858918", "859174", "859182", "859726", "859850",
"862222", "864356", "865028", "869700", "871576", "872256", "873350",
"873597", "875873", "883140", "886308", "886592", "886706", "892144",
"893930", "894959", "896820", "900374", "901373", "902879", "904147",
"905194", "906305", "908049", "908798", "911505", "913314", "915390",
"915833", "919057", "922432", "924120", "925640", "927671", "932006",
"936810", "936916", "938349", "940727", "941945", "942271", "943188",
"944548", "945783", "947164", "948322", "949181", "951414", "952632",
"955090", "956428", "956985", "959916", "960349", "962224", "962980",
"964665", "967160", "967588", "969929", "972543", "972893", "977734",
"978083", "978981", "980427", "980782", "981541", "981850", "982220",
"983781", "985193", "986366", "988934", "989056", "991218", "991914",
"995411", "995630", "995873", "995936", "996309"), class = "factor"),
aid = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("mem",
"noMem"), class = "factor"), gentest = structure(c(1L, 2L,
1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 1L, 2L,
2L, 1L, 2L), .Label = c("first", "second"), class = "factor"),
rocks = structure(c(1L, 1L, 2L, 2L, 3L, 3L, 4L, 4L, 5L, 5L,
6L, 6L, 7L, 7L, 8L, 8L, 8L, 8L, 1L, 1L), .Label = c("R1",
"R2", "R3", "R4", "R5", "R6", "R7", "R8"), class = "factor"),
rating = c(7L, 5L, 2L, 7L, 4L, 2L, 6L, 3L, 3L, 2L, 3L, 3L,
2L, 1L, 3L, 6L, 3L, 2L, 2L, 4L), condition = structure(c(2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L), .Label = c("baseline", "category", "property"
), class = "factor"), order = structure(c(1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L), .Label = c("after", "before", "none"), class = "factor")), row.names = c(NA,
-20L), class = c("grouped_df", "tbl_df", "tbl", "data.frame"), groups = structure(list(
turkcode = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L), .Label = c("100879",
"104655", "108505", "110324", "110600", "112445", "114083",
"115814", "116573", "117411", "117817", "118651", "119324",
"121548", "121883", "121918", "123275", "123718", "125491",
"127450", "127825", "128062", "129061", "131404", "135358",
"135594", "135671", "135945", "137951", "138675", "139469",
"140924", "145730", "147222", "148533", "150851", "153455",
"158882", "164468", "166907", "169260", "171463", "172398",
"175565", "177108", "179000", "180270", "183953", "185574",
"185880", "185948", "186371", "187787", "189220", "190014",
"192550", "193904", "195308", "196755", "197493", "198368",
"200155", "200297", "201915", "214519", "215994", "217903",
"218771", "219302", "220434", "222740", "223223", "224721",
"225118", "225223", "229856", "229874", "231301", "232576",
"233842", "234215", "237581", "239567", "240609", "241098",
"241423", "242108", "244633", "246055", "251597", "252929",
"255252", "256652", "259936", "274962", "277053", "279422",
"280317", "282602", "283750", "285737", "286259", "287544",
"288507", "290503", "291401", "291835", "292160", "294117",
"297863", "298061", "299347", "299499", "301399", "304875",
"305231", "306312", "307410", "308979", "311157", "311524",
"311630", "318956", "318988", "319995", "321405", "324288",
"327086", "327559", "328345", "328401", "330318", "330909",
"332723", "334115", "334517", "335811", "335831", "337145",
"338323", "338542", "338575", "340083", "341182", "343612",
"343947", "344554", "346476", "349874", "350117", "350433",
"350972", "351187", "355311", "356717", "359366", "360048",
"360058", "361191", "361971", "362827", "363543", "367244",
"374254", "374965", "376278", "377622", "382139", "382916",
"384586", "385229", "386782", "388951", "389029", "390299",
"390662", "396335", "396732", "398076", "398573", "399276",
"399587", "403388", "406073", "406160", "411977", "412935",
"417350", "420060", "421393", "422944", "424462", "427143",
"429291", "430758", "431629", "431638", "431935", "432218",
"433788", "434291", "436681", "437087", "439385", "439499",
"440477", "440834", "441253", "441876", "443826", "444080",
"447597", "452643", "454649", "457055", "457946", "463512",
"464079", "464123", "467897", "468650", "470211", "471115",
"471512", "475493", "476937", "479198", "482871", "484066",
"484070", "485462", "486402", "491701", "491835", "499644",
"501833", "502335", "502373", "504800", "507439", "507946",
"507987", "509066", "513078", "515519", "517017", "517988",
"519144", "519210", "519858", "522847", "523683", "525315",
"528577", "532463", "532630", "533028", "539033", "539852",
"540690", "546773", "546916", "549652", "551599", "554198",
"556066", "559920", "560804", "560857", "562080", "562420",
"563841", "565668", "565776", "566509", "569039", "572553",
"575364", "576421", "576694", "576877", "577120", "577155",
"577534", "577605", "578463", "578820", "578995", "580213",
"581893", "582433", "582905", "583887", "584569", "585314",
"585566", "587393", "589144", "592284", "594463", "596863",
"601837", "602632", "604254", "605885", "609296", "609963",
"610062", "612437", "612949", "613161", "614372", "614777",
"615372", "615384", "616927", "618118", "620041", "620336",
"621634", "622289", "624098", "626163", "626612", "627019",
"627856", "630003", "630255", "634018", "634478", "635801",
"638606", "640012", "641078", "641366", "641436", "641821",
"642076", "642446", "643329", "643942", "644015", "646792",
"647254", "647700", "649516", "650792", "650810", "651229",
"652387", "652671", "654778", "657964", "658894", "660500",
"660607", "664469", "666754", "666796", "668996", "669712",
"671682", "673516", "675712", "677835", "678008", "679262",
"680295", "686455", "690471", "691175", "692489", "694023",
"696001", "698716", "700133", "700641", "707812", "707953",
"708010", "708881", "713657", "715255", "715386", "716764",
"718936", "719956", "725348", "727753", "728436", "729588",
"730513", "731928", "732013", "732438", "733366", "733559",
"734672", "735174", "735675", "737044", "737127", "741264",
"745262", "748173", "748414", "748943", "749221", "749963",
"750363", "753518", "754512", "754970", "758639", "760838",
"761642", "766250", "770646", "772574", "773054", "775271",
"776762", "778208", "779453", "781378", "781861", "782257",
"785763", "785860", "787011", "790280", "791735", "791903",
"792178", "796650", "796822", "796970", "798621", "802731",
"804701", "805606", "807848", "809142", "810539", "812182",
"812321", "814029", "814545", "814774", "815079", "816572",
"824215", "825063", "827763", "829973", "829983", "830126",
"832112", "832666", "833066", "834756", "835270", "835340",
"837413", "837746", "839882", "846097", "847975", "848746",
"851745", "851975", "856622", "858918", "859174", "859182",
"859726", "859850", "862222", "864356", "865028", "869700",
"871576", "872256", "873350", "873597", "875873", "883140",
"886308", "886592", "886706", "892144", "893930", "894959",
"896820", "900374", "901373", "902879", "904147", "905194",
"906305", "908049", "908798", "911505", "913314", "915390",
"915833", "919057", "922432", "924120", "925640", "927671",
"932006", "936810", "936916", "938349", "940727", "941945",
"942271", "943188", "944548", "945783", "947164", "948322",
"949181", "951414", "952632", "955090", "956428", "956985",
"959916", "960349", "962224", "962980", "964665", "967160",
"967588", "969929", "972543", "972893", "977734", "978083",
"978981", "980427", "980782", "981541", "981850", "982220",
"983781", "985193", "986366", "988934", "989056", "991218",
"991914", "995411", "995630", "995873", "995936", "996309"
), class = "factor"), rocks = structure(c(1L, 1L, 2L, 2L,
3L, 3L, 4L, 4L, 5L, 5L, 6L, 6L, 7L, 7L, 8L, 8L, 1L, 1L), .Label = c("R1",
"R2", "R3", "R4", "R5", "R6", "R7", "R8"), class = "factor"),
gentest = structure(c(1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L,
2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L), .Label = c("first",
"second"), class = "factor"), .rows = list(1L, 2L, 3L, 4L,
5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 13L, 14L, 15:16, 17:18,
19L, 20L)), row.names = c(NA, -18L), class = c("tbl_df",
"tbl", "data.frame"), .drop = TRUE))
Does anyone know how I can modify the second set of ratings for rock #8 so that I can pivot the data wider or even exclude this data from the dataset altogether?
EDIT:
Here is an example of how I'd like the output to look
id <- rep("100879", times = 6)
aid <- rep("mem", times = 6)
test <- rep(c("first", "second"), times = 3)
order <- rep("after", times = 6)
condition <- rep ("cat", times = 6)
R1 <- sample(0:9, 6, replace=T)
R2 <- sample(0:9, 6, replace=T)
R3 <- sample(0:9, 6, replace=T)
R4 <- sample(0:9, 6, replace=T)
R5 <- sample(0:9, 6, replace=T)
R6 <- sample(0:9, 6, replace=T)
R7 <- sample(0:9, 6, replace=T)
R8 <- sample(0:9, 6, replace=T)
df <- cbind(id, aid, test, order, condition, R1, R2, R3, R4, R5, R6, R7, R8)
a data.table suggestion
library( data.table )
#set data as data.table
setDT( mydata )
#create rowid by group
mydata[, row_id := rowidv( mydata, cols = c("turkcode", "aid", "gentest", "condition", "order", "rocks") ) ]
#create new rocks-column to group on
mydata[, rocks2 := paste0( rocks, ifelse( row_id == 1, "", paste0("_",row_id ) ) ) ]
#now cast to wide
dcast( mydata, turkcode + aid + gentest + condition + order ~ rocks2, value.var = "rating" )
# turkcode aid gentest condition order R1 R2 R3 R4 R5 R6 R7 R8 R8_2
# 1: 100879 mem first category after 7 2 4 6 3 3 2 3 6
# 2: 100879 mem second category after 5 7 2 3 2 3 1 3 2
# 3: 104655 mem first category after 2 NA NA NA NA NA NA NA NA
# 4: 104655 mem second category after 4 NA NA NA NA NA NA NA NA
Another option using pivot_wider and separate
library(dplyr)
library(tidyr)
#short version, but you will end up with R1-R8 in list foramt
df %>%
pivot_wider(id_cols = c("turkcode", "aid", "gentest", "condition", "order"),
names_from = "rocks", values_from = "rating", values_fn = list(rating = list))
#clean version
df %>%
#id_cols: A set of columns that uniquely identifies each observation.
#Defaults to all columns in data except for the columns specified in names_from and values_from.
pivot_wider(id_cols = c("turkcode", "aid", "gentest", "condition", "order"),
names_from = "rocks",
values_from = "rating",
values_fn = list(rating = ~paste(., collapse = ","))
#values_fn = list(rating = mean)
#,values_fill = list(rating=0)
) %>%
separate(R8, into = c('R8','R8_1'))
# A tibble: 4 x 14
# Groups: turkcode, gentest [1,118]
turkcode aid gentest condition order R1 R2 R3 R4 R5 R6 R7 R8 R8_1
<fct> <fct> <fct> <fct> <fct> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 100879 mem first category after 7 2 4 6 3 3 2 3 6
2 100879 mem second category after 5 7 2 3 2 3 1 3 2
3 104655 mem first category after 2 NA NA NA NA NA NA NA NA
4 104655 mem second category after 4 NA NA NA NA NA NA NA NA

R - replace values by row given some statement in if loop with another value in same df

I have a dataset with which I want to conduct a multilevel analysis. Therefore I have two rows for every patient, and a couple column with 1's and 2's (1 = patient, 2 = partner of patient).
Now, I have variables with date of birth and age, for both patient and partner in different columns that are now on the same row.
What I want to do is to write a code that does:
if mydata$couple == 2, then replace mydata$dateofbirthpatient with mydata$dateofbirthpatient
And that for every row. Since I have multiple variables that I want to replace, it would be lovely if I could get this in a loop and just 'add' variables that I want to replace.
What I tried so far:
mydf_longer <- if (mydf_long$couple == 2) {
mydf_long$pgebdat <- mydf_long$prgebdat
}
Ofcourse this wasn't working - but simply stated this is what I want.
And I started with this code, following the example in By row, replace values equal to value in specified column
, but don't know how to finish:
mydf_longer[6:7][mydf_longer[,1:4]==mydf_longer[2,2]] <-
Any ideas? Let me know if you need more information.
Example of data:
# id couple groep_MNC zkhs fbeh pgebdat p_age pgesl prgebdat pr_age
# 1 3 1 1 1 1 1955-12-01 42.50000 1 <NA> NA
# 1.1 3 2 1 1 1 1955-12-01 42.50000 1 <NA> NA
# 2 5 1 1 1 1 1943-04-09 55.16667 1 1962-04-18 36.5
# 2.1 5 2 1 1 1 1943-04-09 55.16667 1 1962-04-18 36.5
# 3 7 1 1 1 1 1958-04-10 40.25000 1 <NA> NA
# 3.1 7 2 1 1 1 1958-04-10 40.25000 1 <NA> NA
mydf_long <- structure(
list(id = c(3L, 3L, 5L, 5L, 7L, 7L),
couple = c(1L, 2L, 1L, 2L, 1L, 2L),
groep_MNC = c(1L, 1L, 1L, 1L, 1L, 1L),
zkhs = c(1L, 1L, 1L, 1L, 1L, 1L),
fbeh = c(1L, 1L, 1L, 1L, 1L, 1L),
pgebdat = structure(c(-5145, -5145, -9764, -9764, -4284, -4284), class = "Date"),
p_age = c(42.5, 42.5, 55.16667, 55.16667, 40.25, 40.25),
pgesl = c(1L, 1L, 1L, 1L, 1L, 1L),
prgebdat = structure(c(NA, NA, -2815, -2815, NA, NA), class = "Date"),
pr_age = c(NA, NA, 36.5, 36.5, NA, NA)),
.Names = c("id", "couple", "groep_MNC", "zkhs", "fbeh", "pgebdat",
"p_age", "pgesl", "prgebdat", "pr_age"),
row.names = c("1", "1.1", "2", "2.1", "3", "3.1"),
class = "data.frame"
)
The following for loop should work if you only want to change the values based on a condition:
for(i in 1:nrow(mydata)){
if(mydata$couple[i] == 2){
mydata$pgebdat[i] <- mydata$prgebdat[i]
}
}
OR
As suggested by #lmo, following will work faster.
mydata$pgebdat[mydata$couple == 2] <- mydata$prgebdat[mydata$couple == 2]

Calculating ratios by group with dplyr

Using the following dataframe I would like to group the data by replicate and group and then calculate a ratio of treatment values to control values.
structure(list(group = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L), .Label = c("case", "controls"), class = "factor"), treatment = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "EPA", class = "factor"),
replicate = structure(c(2L, 4L, 3L, 1L, 2L, 4L, 3L, 1L), .Label = c("four",
"one", "three", "two"), class = "factor"), fatty_acid_family = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "saturated", class = "factor"),
fatty_acid = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "14:0", class = "factor"),
quant = c(6.16, 6.415, 4.02, 4.05, 4.62, 4.435, 3.755, 3.755
)), .Names = c("group", "treatment", "replicate", "fatty_acid_family",
"fatty_acid", "quant"), class = "data.frame", row.names = c(NA,
-8L))
I have tried using dplyr as follows:
group_by(dataIn, replicate, group) %>% transmute(ratio = quant[group=="case"]/quant[group=="controls"])
but this results in Error: incompatible size (%d), expecting %d (the group size) or 1
Initially I thought this might be because I was trying to create 4 ratios from a df 8 rows deep and so I thought summarise might be the answer (collapsing each group to one ratio) but that doesn't work either (my understanding is a shortcoming).
group_by(dataIn, replicate, group) %>% summarise(ratio = quant[group=="case"]/quant[group=="controls"])
replicate group ratio
1 four case NA
2 four controls NA
3 one case NA
4 one controls NA
5 three case NA
6 three controls NA
7 two case NA
8 two controls NA
I would appreciate some advice on where I'm going wrong or even if this can be done with dplyr.
Thanks.
You can try:
group_by(dataIn, replicate) %>%
summarise(ratio = quant[group=="case"]/quant[group=="controls"])
#Source: local data frame [4 x 2]
#
# replicate ratio
#1 four 1.078562
#2 one 1.333333
#3 three 1.070573
#4 two 1.446449
Because you grouped by replicate and group, you could not access data from different groups at the same time.
#talat's answer solved for me. I created a minimal reproducible example to help my own understanding:
df <- structure(list(a = c("a", "a", "b", "b", "c", "c", "d", "d"),
b = c(1, 2, 1, 2, 1, 2, 1, 2), c = c(22, 15, 5, 0.2, 107,
6, 0.2, 4)), row.names = c(NA, -8L), class = c("tbl_df",
"tbl", "data.frame"))
# a b c
# 1 a 1 22.0
# 2 a 2 15.0
# 3 b 1 5.0
# 4 b 2 0.2
# 5 c 1 107.0
# 6 c 2 6.0
# 7 d 1 0.2
# 8 d 2 4.0
library(dplyr)
df %>%
group_by(a) %>%
summarise(prop = c[b == 1] / c[b == 2])
# a prop
# 1 a 1.466667
# 2 b 25.000000
# 3 c 17.833333
# 4 d 0.050000

Resources