Make output of two rows into columns R - r

I am currently working with behavioural data in R from video analyses in BORIS. Every observation is 15 seconds and during this observation I noted the subject, its behaviour but also some background information such as the date, time of day, temperature, etc. However, the program has put this background information under the column "Behaviour" (so one of the behaviours is now "date") and its output under the column "Modifier" (which now says "15-10-2020" for example).
What I want is make more columns of date, time etc (from the column "Behaviour") and put its output (from the column "Modifier") in these columns, so that every behaviour has a subject, date, time, temperature, and so forth. I have however no idea how to do this.
I thought about using the function aggregate, but this gives me lots of extra rows with mainly NA's. I also looked into the package "tibble" but can't really make that work either.
Any suggestions would be greatly appreciated!
Some example rows (from dput()):
structure(list(Subject = structure(c(2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 7L), .Label = c("fallow deer female", "fallow deer female + calf",
"red deer female + calf", "roe deer male", "wild boar + young",
"wild boar male", "wild boar unknown sex"), class = "factor"),
Behavior = structure(c(1L, 2L, 8L, 7L, 12L, 3L, 5L, 10L,
6L, 4L), .Label = c("auditory vigilant", "date", "day/night",
"foraging", "nr. of individuals", "running", "temperature",
"time of day", "unknown behaviour", "walking", "walking while vigilant",
"weather"), class = "factor"), Behavioral.category = structure(c(4L,
2L, 2L, 2L, 2L, 2L, 2L, 3L, 4L, 3L), .Label = c("", "Background information",
"Non-vigilant", "Vigilant"), class = "factor"), Modifiers = structure(c(1L,
4L, 21L, 27L, 35L, 36L, 32L, 1L, 1L, 1L), .Label = c("",
"0346", "0347", "07172020", "07182020", "07212020", "07242020",
"07262020", "07272020", "08032020", "08052020", "1", "12",
"1307", "1327", "1342", "1343", "1430", "1528", "16", "1604",
"17", "1744", "21", "2119", "2120", "22", "23", "25", "26",
"3", "4", "7", "Clear", "Cloudy", "Day", "Night"), class = "factor")), row.names = c(NA,
10L), class = "data.frame")
The output that I'd like to have would give as column names: Subject; Behavior; Date; Time of Day; Temperature. The modifier output would be the values of the columns "Date", "Time of Day", "Temperature". When this works, I could delete the column Modifiers (since all its values are already in assigned columns).

Split up the dataframe in actual behaviours and background information. Perform this code on the background information:
tidyr::pivot_wider(your_data, names_from = Behavior, values_from = Modifiers)
Merge the dataframes!

Related

combine multiple elements in a list with different indexes in r

I have a list and I need to add together elements with different indexes. I'm struggling because I want to create a loop at different indexes.
data(aSAH)
rocobj <- roc(aSAH$outcome, aSAH$s100b)
dat<-coords(rocobj, "all", ret=c("threshold","sensitivity", "specificity"), as.list=TRUE)
I want to create a function where I can look at all the sensitivity/1-specificity combos at all thresholds in a new data frame. I know threshold is found in dat[1,], sensitivity is found in dat[2,] and specificity is found in dat[3,]. So I tried:
for (i in length(dat)) {
print(dat[1,i]
print(dat[2,i]/(1-dat[3,i]))
}
Where I should end up with a dataframe that has threshold and sensitivity/1-specificity.
DATA
dput(head(aSAH))
structure(list(gos6 = structure(c(5L, 5L, 5L, 5L, 1L, 1L), .Label = c("1",
"2", "3", "4", "5"), class = c("ordered", "factor")), outcome = structure(c(1L,
1L, 1L, 1L, 2L, 2L), .Label = c("Good", "Poor"), class = "factor"),
gender = structure(c(2L, 2L, 2L, 2L, 2L, 1L), .Label = c("Male",
"Female"), class = "factor"), age = c(42L, 37L, 42L, 27L,
42L, 48L), wfns = structure(c(1L, 1L, 1L, 1L, 3L, 2L), .Label = c("1",
"2", "3", "4", "5"), class = c("ordered", "factor")), s100b = c(0.13,
0.14, 0.1, 0.04, 0.13, 0.1), ndka = c(3.01, 8.54, 8.09, 10.42,
17.4, 12.75)), .Names = c("gos6", "outcome", "gender", "age",
"wfns", "s100b", "ndka"), row.names = 29:34, class = "data.frame")
EDIT
One answer:
dat_transform <- as.data.frame(t(dat))
dat_transform <- dat_transform %>% mutate(new=sensitivity/(1-specificity))
You can use :
transform(t, res = sensitivity/(1-specificity))[c(1, 4)]
Or with dplyr :
library(dplyr)
t %>%
mutate(res = sensitivity/(1-specificity)) %>%
select(threshold, res)
Also note that t is a default function in R to tranpose dataframe so better to use some other variable name for the dataframe.

Error in ggplot

I am trying to make a ggplot. When I had shape in aesthetics, the code was working just fine. However, I need to put shape in geom_point() because I'm trying to reproduce a figure. And when I added shape to geom_point() it gave me the following error:
Aesthetics must be either length 1 or the same as the data (6): shape
I've looked for other answers here but apparently, nothing seems to be working for me. Above I've provided with an image of what my data looks like. There are 17000 entries.
Below is my code:
summarised_data <-ddply(mammals,c('mammals$chr','mammals$Species','mammals$chrMark'),
function (x) c(median_rpkm = median(x$RPKM), median = median(x$dNdS)))
ggplot(summarised_data,aes(x = summarised_data$median_rpkm, y = summarised_data$median,
color = summarised_data$`mammals$Species`)) + geom_smooth(se = FALSE, method = "lm") +
geom_point(shape = summarised_data$`mammals$chrMark`) + xlab("median RPKM") + ylab("dNdS")
"ENSG00000213221", "ENSG00000213341", "ENSG00000213380", "ENSG00000213424",
"ENSG00000213533", "ENSG00000213551", "ENSG00000213619", "ENSG00000213626",
"ENSG00000213699", "ENSG00000213782", "ENSG00000213949", "ENSG00000214013",
"ENSG00000214338", "ENSG00000214357", "ENSG00000214367", "ENSG00000214517",
"ENSG00000214814", "ENSG00000215203", "ENSG00000215305", "ENSG00000215367",
"ENSG00000215440", "ENSG00000215897", "ENSG00000221947", "ENSG00000222011",
"ENSG00000224051", "ENSG00000225830", "ENSG00000225921", "ENSG00000239305",
"ENSG00000239474", "ENSG00000239900", "ENSG00000241058", "ENSG00000242247",
"ENSG00000242612", "ENSG00000243646", "ENSG00000244038", "ENSG00000244045"),
class = "factor"), Species = structure(c(1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L), .Label = c("Chimp", "Gori", "Human", "Maca",
"Mouse", "Oran"), class = "factor"), labs = structure(c(2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("Chimp-A", "Chimp-X",
"Gori-A", "Gori-X", "Human-A", "Human-X", "Maca-A", "Maca-X",
"Mouse-A", "Mouse-X", "Oran-A", "Oran-X"), class = "factor"),
chrMark = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L), .Label = c("A", "X"), class = "factor"), chr = structure(c(27L,
27L, 27L, 27L, 27L, 27L, 27L, 27L, 27L, 27L), .Label = c("1",
"10", "11", "12", "13", "14", "15", "16", "17", "18", "19",
"2", "20", "21", "22", "2a", "2A", "2b", "2B", "3", "4",
"5", "6", "7", "8", "9", "X"), class = "factor"), dN = c(3.00669,
3.27182, 7.02044, 1.01784, 3.0363, 2.32786, 4.92959, 3.03753,
3.0776, 1.02147), dS = c(3.15631, 5.87147, 3.13716, 2.05438,
4.10205, 5.24764, 4.2014, 3.18086, 5.4942, 3.02169), dNdS = c(0.9525965447,
0.5572403504, 2.2378329444, 0.4954487485, 0.7401908802, 0.4436013141,
1.1733207978, 0.954939859, 0.5601543446, 0.3380459279), RPKM = c(31.6,
13.9, 26.3, 9.02, 11.3, 137, 242, 1.05, 59.4, 10.1), Tau = c(0.7113820598,
0.8391023102, 0.3185943152, 0.6887167806, 0.9120531859, 0.6254200542,
0.7165302682, 0.7257435312, 0.2586613298, 0.6493567251),
GC3 = c(0.615502, 0.622543, 0.393064, 0.490141, 0.461592,
0.626407, 0.490305, 0.482853, 0.346424, 0.466484)), .Names = c("gene",
"Species", "labs", "chrMark", "chr", "dN", "dS", "dNdS", "RPKM",
"Tau", "GC3"), row.names = c(NA, 10L), class = "data.frame")
There's a few things wrong with your code and how ggplot handles non-standard evaluation, I'd recommend reading a ggplot tutorial or the docs. Having a column called within summarised_data called 'mammals$species' and 'mammals$chrMark' is going to cause lots of problems.
If we change these to something more sensible...
names(summarised_data)[names(summarised_data) == "mammals$species"] <- "mammals_species"
names(summarised_data)[names(summarised_data) == "mammals$chrMark"] <- "mammals_chrMark"
We can make the ggplot code more friendly. Note that shape has to been within aes, as you're mapping it to your data.
ggplot(summarised_data, aes(x = median_rpkm, y = median)) +
geom_smooth(se = FALSE, method = "lm") +
geom_point(aes(shape = mammals_chrMark,
color = mammals_species)) +
xlab("median RPKM") + ylab("dNdS")
Hopefully this should work, or at least get you somewhere closer to an answer.

Unable to create plots in Rpres (RStudio's HTML presentation format)

I am trying to create a HTML5 presentation with ggplot2 plots in it. I am using Rstudio's Rpres format. However I see no plot in the output presentation. For the example below, I get a textbox with a message like this:
<img src="PT terms for each age
Strata.Rnw-figure/unnamed-chunk-1-1.png" title="plot of chunk
unnamed-chunk-1" alt="plot of chunk unnamed-chunk-1" style="display:
block; margin: auto;" />
For other chunks I see no plot at all. I see the figures generated in a subdirectory, but they are not included in the presentation.
This could be due to the fact that I use setwd to change the current directory inside one of the chunks.
How do I make sure the plots are added to the presentation?
```{r, echo=FALSE,fig.width=8, fig.height=4, warning=FALSE, eval=TRUE, message=FALSE, tidy=TRUE, fig.align='center',fig=TRUE}
PT.term.table.combo.df <- structure(list(term = structure(c(3L, 6L, 10L, 9L, 5L, 8L, 2L,
7L, 1L, 4L, 11L, 16L, 20L, 13L, 18L, 19L, 15L, 14L, 17L, 12L), .Label = c("Erythema",
"Injection site erythema", "Injection site pain", "Injection site swelling",
"Pain", "Pain in extremity", "Paraesthesia", "Pruritus", "Rash",
"Urticaria", "Dizziness", "Fatigue", "Headache", "Unknown",
"Loss of consciousness", "Nausea", "Pallor", "Pyrexia", "Syncope",
"Vomiting", "Blood pressure decreased", "Condition aggravated",
"Convulsion", "Fall", "Grand mal convulsion", "Head injury",
"Immediate post-injection reaction", "Condition8",
"Condition2", "Condition3", "Condition4",
"Condition1", "Menstruation delayed", "Menstruation irregular",
"Condition5", "Condition12", "Unevaluable event"
), class = "factor"), normalized.count = structure(c(0.758666519304954,
0.509556068608868, 0.498746392459638, 0.426484861272957, 0.41955098519173,
0.333070361160926, 0.306233446655841, 0.303395720748491, 0.281332387076534,
0.275858307359097, 2.05157281092953, 1.55514068644281, 0.761792303294041,
0.730331039886107, 0.553772087835693, 0.545722098808532, 0.426814370578148,
0.422207780194755, 0.401335815218956, 0.325021057176447), .Names = c("Injection site pain",
"Pain in extremity", "Urticaria", "Rash", "Pain", "Pruritus",
"Injection site erythema", "Paraesthesia", "Erythema", "Injection site swelling",
"Dizziness", "Nausea", "Vomiting", "Headache", "Pyrexia", "Syncope",
"Loss of consciousness", "Hyperhidrosis", "Pallor", "Fatigue"
)), source = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("1",
"2", "3", "4"), class = "factor")), .Names = c("term", "normalized.count",
"source"), row.names = c("Injection site pain", "Pain in extremity",
"Urticaria", "Rash", "Pain", "Pruritus", "Injection site erythema",
"Paraesthesia", "Erythema", "Injection site swelling", "Dizziness",
"Nausea", "Vomiting", "Headache", "Pyrexia", "Syncope", "Loss of consciousness",
"Hyperhidrosis", "Pallor", "Fatigue"), class = "data.frame")
library(ggplot2)
library(gdata)
#PT.term.table.combo.df <- combine(lapply( PT.term.tables$communities , FUN = function(x) { data.frame (term = names(x), normalized.count = x)}),names = 1:4)
PT.term.table.combo.df <- do.call(what=combine,args=lapply( PT.term.tables$communities , FUN = function(x) { data.frame (term = names(x), normalized.count = x)}))
levels(PT.term.table.combo.df$source)<- 1:4
#PT.term.table <- PT.term.tables$communities[[1]]
#term.df <- data.frame (term=names(PT.term.table), normalized.count = PT.term.table)
PT.plot<-ggplot(data=PT.term.table.combo.df, aes(x=term, y=normalized.count )) +
geom_bar(stat='identity') + coord_flip()+facet_wrap(~source)
print(PT.plot)
```
Ok, I was able to fix my problem by renaming my Rpres file, so that it does not have any spaces. So instead of "PT terms for each age Strata.Rnw.Rpres" I chose"PT_terms_plots.Rpres" as the filename.
If you believe this might be a bug, comment to let me know so that I can contact Rstudio devs.

Reorder bars within ggplot2 dodged barplot, levels are correct

I'm making a series of barplots, all fairly similar to this one that I use the following code to generate.
I've seen many posts about this, but I have tried to change the order of the variable of interest and that isn't working. Any tips?
library(ggplot2)
trtslope<-structure(list(Geno = structure(c(5L, 5L, 5L, 5L, 5L, 5L, 5L), .Label = c("1a", "1b", "1c", "1d", "1e", "1f", "2co", "2h", "4f", "5t", "pin3", "pin3pin7"), class ="factor"), Light = structure(c(2L,2L, 2L, 2L, 2L, 2L, 2L), .Label = c("Constant","Cycle"),class = "factor"), Trmt = structure(c(6L, 3L, 5L, 2L, 7L, 4L, 1L), .Label = c("None","10^-11Kin", "10^-10Kin", "10^-9Kin", "10^-11IAA", "10^-10IAA",
"10^-9IAA"), class = "factor"), mean = c(-1.54504597523189,
-1.53362395751867, -1.57385562758997, -1.54322151503139,
-1.66574978029235, -1.32095137998064, -1.36520900266343),
sd = c(0.46315259286543, 0.458985115845406, 0.482009142703553,
0.641786961061545, 0.590265055416619, 0.378034730883596,
0.400241364404397), ste = c(0.0545830565232129, 0.0564971622467328,
0.0607274438691926, 0.0789985139281606, 0.0647900070055787,
0.0417469522407694, 0.0292685472783953), Conc = c("10^-10",
"10^-10", "10^-11", "10^-11", "10^-9", "10^-9", "None"),
Group = c("IAA", "Kinetin", "IAA", "Kinetin", "IAA", "Kinetin",
"None")), .Names = c("Geno", "Light", "Trmt", "mean", "sd", "ste", "Conc", "Group"), row.names = c(13L, 14L, 17L, 18L, 31L, 32L, 37L), class = "data.frame")
trtslopeplot<-ggplot(trtslope, aes(x=Group, y=mean, fill=Conc))+ geom_bar(stat="identity", position = position_dodge())+geom_errorbar(aes(ymin=mean-ste, ymax=mean+ste),position=position_dodge(.9),width=.2)
#I've tried to reorder the factors as I've done in the past, but that doesn't seem to work to change the plot
trtslope$Trmt <- factor( as.character(trtslope$Trmt), levels= c("None","10^-11Kin","10^-10Kin","10^-9Kin","10^-11IAA","10^-10IAA","10^-9IAA"))
trtslope <- trtslope[order(trtslope$Trmt),]

change border from around legend from a scatterplot

This should be simple, but I can't figure out how to remove the border from around my legend. I would also like to place the legend within the graph and remove the inner grid lines and the top and left side border. I am using the scatterplot function and this is the code I've written thus far:
scatterplot(Comp1~ln1wr|Season, moose,
xlab = "Risk", ylab = "Principal component 1",
labels= row.names(moose), by.groups=T, smooth=F, boxplots=F, legend.plot=F)
legend("bottomleft", moose, fill=0)
Here I was just experimenting to even see if I could get the legend to be placed somewhere else, but each time I run this code, I get an error
Error in as.graphicsAnnot(legend) :
argument "legend" is missing, with no default
I would like to place the legend within the graph, but where it will not conflict with the data displaying. here is sample data:
structure(list(ID = structure(c(1L, 1L, 1L, 1L, 1L, 32L, 33L,
33L, 34L, 34L, 34L), .Label = c("F07001", "F07002", "F07003",
"F07004", "F07005", "F07006", "F07008", "F07009", "F07010", "F07011",
"F07014", "F07015", "F07017", "F07018", "F07019", "F07020", "F07021",
"F07022", "F07023", "F07024", "F10001", "F10004", "F10008", "F10009",
"F10010", "F10012", "F10013", "F98015", "M07007", "M07012", "M07013",
"M07016", "M10007", "M10011", "M10015"), class = "factor"), Season = structure(c(1L,
1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L), .Label = c("SUM", "WIN"
), class = "factor"), Time = structure(c(1L, 2L, 1L, 2L, 1L,
2L, 1L, 2L, 1L, 2L, 1L), .Label = c("day", "night"), class = "factor"),
Repro = structure(c(2L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L,
3L), .Label = c("f", "fc", "m"), class = "factor"), Comp1 = c(-0.524557195,
-0.794214153, -0.408247216, -0.621285004, -0.238828585, 0.976634392,
-0.202405922, -0.633821539, -0.306163898, -0.302261589, 1.218779672
), ln1wr = c(0.833126490613386, 0.824526258616325, 0.990730077688989,
0.981816265754353, 0.933462450382474, 1.446048015519, 1.13253050687157,
1.1349442179155, 1.14965388471562, 1.14879830358128, 1.14055365645628
)), .Names = c("ID", "Season", "Time", "Repro", "Comp1",
"ln1wr"), row.names = c(1L, 2L, 3L, 4L, 5L, 220L, 221L, 222L,
223L, 224L, 225L), class = "data.frame")
I would suggest
par(bty="l",las=1)
scatterplot(Comp1~ln1wr|Season, moose,
xlab = "Risk", ylab = "Principal component 1",
labels= row.names(moose),
by.groups=TRUE, smooth=FALSE, boxplots=FALSE,
grid=FALSE,
legend.plot=FALSE)
legend("bottomright", title="Season",
legend=levels(moose$Season), bty="n",
pch=1:2, col=1:2)
As indicated in ?legend, bty controls the legend box -- "n" means "none.
I put the legend in the bottom right rather than in the bottom left because it seems to avoid your data better that way.
I used bty="l" to eliminate the top and right box edges (this means "box type L")
I used las=1 to get the y-axis tick labels horizontal -- you didn't ask for that but I strongly prefer it
grid=FALSE removes the internal grid lines
You have to unique your moose ID as you have more than one point for each moose.
legend("bottomleft",legend=unique(moose))
Then you have to associate a color and a point type to your legend (corresponding to your moose ID in your plot). I would also have a look at plot() instead of scatterplot().

Resources