I need to create a kinship matrix. For this purpose, I wanted to use the kinship2 library in R, but the sex variable is required, which I don't have. From the documentation I read that you can use the value "3" for unknown gender, but it doesn't work. My code where I'm trying to get the effect but it comes out wrong 1x1 matrix.
My data:
nr.os nr.oj nr.ma ferma
1 169 152 84 3
2 170 152 84 3
3 171 152 84 3
4 172 152 84 3
5 173 152 84 3
6 174 152 84 3
My code:
library(kinship2)
my_data <- read.table("Zeszyt_s1.csv",header = TRUE, sep = ";")
my_data$sex <- as.integer(3)
df_fixed <- fixParents(id = my_data$nr.os, dadid=my_data$nr.oj, momid=my_data$nr.ma, sex=my_data$sex)
pedAll <- with(df_fixed,pedigree(
id = id,
dadid = dadid,
momid = momid,
sex = sex))
kinship(pedAll["1"])
Output:
1
1 0.5
Related
I have a table with data on the sales volumes of some products. I want to build several boxplots for each product. I.e. vertically I have sales volume and horizontally I have days. When building, I do not build boxplots in certain values. What is the reason for this?
Here is table:
Day Cottage cheese..pcs. Kefir..pcs. Sour cream..pcs.
1 1 99 103 111
2 2 86 101 114
3 3 92 100 116
4 4 87 112 120
5 5 86 104 111
6 6 88 105 122
7 7 88 106 118
Here is my code:
head(out1)# out1-the table above
boxplot(Day~Cottage cheese..pcs., data = out1)
Here is the result:
Try below:
# example data
out1 <- read.table(text = " Day Cottage.cheese Kefir Sour.cream
1 1 99 103 111
2 2 86 101 114
3 3 92 100 116
4 4 87 112 120
5 5 86 104 111
6 6 88 105 122
7 7 88 106 118", header = TRUE)
# reshape wide-to-long
outlong <- stats::reshape(out1, idvar = "Day", v.names = "value",
time = "product", times = colnames(out1)[2:4],
varying = colnames(out1)[2:4], direction = "long")
# then plot
boxplot(value~product, outlong)
In addition to the provided answer, if you desire to vertically have sales volume and horitontally have days (using the out1 data provided by zx8754).
library(tidyr)
library(data.table)
library(ggplot2)
#data from wide to long
dt <- pivot_longer(out1, cols = c("Kefir", "Sour.cream", "Cottage.cheese"), names_to = "Product", values_to = "Value")
#set dt to data.table object
setDT(dt)
#convert day from integer to a factor
dt[, Day := as.factor(Day)]
#ggplot
ggplot(dt, aes(x = Day, y = Value)) + geom_bar(stat = "identity") + facet_wrap(~Product)
facet_wrap provides separate graphs for the three products.
I created a bar chart here since boxplots would be useless in this case (every product has only one value each day)
I have to match students to their teachers using students' id and their teachers' id. One student can be paired with only one teacher. However, a student may have multiple entries with the same teacher. I want to retain only the cases where students are matched to a single unique teacher. Here is an example to the data frame.
CHID <- c(111,111,111,112,112,113,113,113,113,114), TEAID <- c(115,115,115,162,165,168,168,168,187,119), SCORE <- c(56,56,56,55,55,58,58,58,58,64)
From these data, I want to retain students with CHID's 111 and 114 since they are matched to one and only one unique teacher. Can some please help with an r-code for performing this task? Your assistance will be greatly appreciated.
Here's a solution using dplyr package -
df <- data.frame(
CHID = c(111,111,111,112,112,113,113,113,113,114),
TEAID = c(115,115,115,162,165,168,168,168,187,119),
SCORE = c(56,56,56,55,55,58,58,58,58,64)
)
group_by(df, CHID) %>% filter(n_distinct(TEAID) == 1) %>% ungroup()
# A tibble: 4 x 3
# CHID TEAID SCORE
# <dbl> <dbl> <dbl>
# 1 111 115 56.0
# 2 111 115 56.0
# 3 111 115 56.0
# 4 114 119 64.0
Here's a solution without any foreign packages -
df[ave(df$TEAID, df$CHID, FUN = function(x) length(unique(x))) == 1, ]
# CHID TEAID SCORE
# 1 111 115 56
# 2 111 115 56
# 3 111 115 56
# 10 114 119 64
I am looking to workout a percentage total over a look back range in R.
I know how to do this in excel with the following formula:
=SUM(B2:B4)/SUM(B2:B4,C2:C4)
This is summing column B over a range of today looking back 3 lines. It then divides this sum buy the total sum of column B + C again looking back 3 lines.
I am looking to achieve the same calculation in R to run across my matrix.
The output would look something like this:
adv dec perct
1 69 376
2 113 293
3 270 150 0.355625492
4 74 371 0.359559402
5 308 96 0.513790386
6 236 173 0.491255962
7 252 134 0.663886572
8 287 129 0.639966969
9 219 187 0.627483444
This is a line of code I could perhaps add the look back range too:
perct <- apply(data.matrix[,c('adv','dec')], 1, function(x) { (x[1] / x[1] + x[2]) } )
If i could get [1] to sum the previous 3 line range and
If i could get [2] to also sum the previous 3 line range.
Still learning how to apply forward and look back periods within R. So any additional learning on the answer would be appreciated!
Here are some approaches. The first 3 use rollsumr and/or rollapplyr in zoo and the last one uses only the base of R.
1) rollsumr Create a matrix with rollsumr whose columns contain the rollling sums, convert that to row proportions and take the "adv" column. Finally assign that to a new column frac in DF. This approach has the shortest code.
library(zoo)
DF$frac <- prop.table(rollsumr(DF, 3, fill = NA), 1)[, "adv"]
giving:
> DF
adv dec frac
1 69 376 NA
2 113 293 NA
3 270 150 0.3556255
4 74 371 0.3595594
5 308 96 0.5137904
6 236 173 0.4912560
7 252 134 0.6638866
8 287 129 0.6399670
9 219 187 0.6274834
1a) This variation is similar except instead of using prop.table we write out the ratio. The code is longer but you may find it clearer.
m <- rollsumr(DF, 3, fill = NA)
DF$frac <- with(as.data.frame(m), adv / (adv + dec))
1b) This is a variation of (1) that is the same except it uses a magrittr pipeline:
library(magrittr)
DF %>% rollsumr(3, fill = NA) %>% prop.table(1) %>% `[`(TRUE, "adv") -> DF$frac
2) rollapplyr We could use rollapplyr with by.column = FALSE like this. The result is the same.
ratio <- function(x) sum(x[, "adv"]) / sum(x)
DF$frac <- rollapplyr(DF, 3, ratio, by.column = FALSE, fill = NA)
3) Yet another variation is to compute the numerator and denominator separately:
DF$frac <- rollsumr(DF$adv, 3, fill = NA) /
rollapplyr(DF, 3, sum, by.column = FALSE, fill = NA)
4) base This uses embed followed by rowSums on each column to get the rolling sums and then uses prop.table as in (1).
DF$frac <- prop.table(sapply(lapply(rbind(NA, NA, DF), embed, 3), rowSums), 1)[, "adv"]
Note: The input used in reproducible form is:
Lines <- "adv dec
1 69 376
2 113 293
3 270 150
4 74 371
5 308 96
6 236 173
7 252 134
8 287 129
9 219 187"
DF <- read.table(text = Lines, header = TRUE)
Consider an sapply that loops through the number of rows in order to index two rows back:
DF$pred <- sapply(seq(nrow(DF)), function(i)
ifelse(i>=3, sum(DF$adv[(i-2):i])/(sum(DF$adv[(i-2):i]) + sum(DF$dec[(i-2):i])), NA))
DF
# adv dec pred
# 1 69 376 NA
# 2 113 293 NA
# 3 270 150 0.3556255
# 4 74 371 0.3595594
# 5 308 96 0.5137904
# 6 236 173 0.4912560
# 7 252 134 0.6638866
# 8 287 129 0.6399670
# 9 219 187 0.6274834
I want to make a SummarizedExperiment,
I have the count table in this format in FeatureCount.txt
SRR1554537 SRR1554538 SRR1554541 SRR1554535 SRR1554536 SRR1554539
1/2-SBSRNA4 39 66 72 23 16 7
A1BG 221 113 226 146 36 126
A1BG-AS1 393 296 527 276 39 258
A1CF 8 7 5 1 0 4
A2LD1 97 208 171 181 72 110
I have the phenotype Data in this format:
SampleName RUN Age sex tissue disease
SRR1554537 R3452_DLPFC_polyA_RNAseq_total SRR1554537 -0.384 female DLPFC control
SRR1554538 R3462_DLPFC_polyA_RNAseq_total SRR1554538 -0.4027 female DLPFC control
SRR1554541 R3485_DLPFC_polyA_RNAseq_total SRR1554541 -0.3836 male DLPFC control
SRR1554535 R2869_DLPFC_polyA_RNAseq_total SRR1554535 41.58 male DLPFC control
SRR1554536 R3098_DLPFC_polyA_RNAseq_total SRR1554536 44.17 female DLPFC control
SRR1554539 R3467_DLPFC_polyA_RNAseq_total SRR1554539 36.5 female DLPFC control
Here is my code:
count_feature <- as.matrix(read.table("featureCount.txt", header = TRUE, stringsAsFactors = FALSE))
phenoData <- read.csv("Pheno_Data.csv", header = TRUE)
col_data <- DataFrame(phenoData)
row_data <- relist(GRanges(), vector("list", length= nrow(count_feature)))
mcols(row_data) <- rownames(count_feature)
Brain_Es <- SummarizedExperiment( assays = list(feature_Count= feature_Count), rowRanges = row_data, colData = col_data)
Error in rownames<-(*tmp*, value = c("X", "SRR1554537", "SRR1554538", :
invalid rownames length
Can you explain the error?
I don't understand what you're trying to do with row_data, but it's clearly not working. You already have gene names from the count table. Why not do
Brain_Es <- SummarizedExperiment(assays = list(counts = count_feature), colData = col_data, rowData = rownames(count_feature));
Have a look at ?SummarizedExperiment, and at the examples given here in section "Constructing a SummarizedExperiment".
I have a binomail dataset that looks like this:
df <- data.frame(replicate(4,sample(1:200,1000,rep=TRUE)))
addme <- data.frame(replicate(1,sample(0:1,1000,rep=TRUE)))
df <- cbind(df,addme)
df <-df[order(df$replicate.1..sample.0.1..1000..rep...TRUE..),]
The data is currently soreted in a way to show the instances belonging to 0 group then the ones belonging to the 1 group. Is there a way I can sort the data in a 0-1-0-1-0... fashion? I mean to show a row that belongs to the 0 group, the row after belonging to the 1 group then the zero group and so on...
All I can think about is complex functions. I hope there's a simple way around it.
Thank you,
Here's an attempt, which will add any extra 1's at the end:
First make some example data:
set.seed(2)
df <- data.frame(replicate(4,sample(1:200,10,rep=TRUE)),
addme=sample(0:1,10,rep=TRUE))
Then order:
with(df, df[unique(as.vector(rbind(which(addme==0),which(addme==1)))),])
# X1 X2 X3 X4 addme
#2 141 48 78 33 0
#1 37 111 133 3 1
#3 115 153 168 163 0
#5 189 82 70 103 1
#4 34 37 31 174 0
#6 189 171 98 126 1
#8 167 46 72 57 0
#7 26 196 30 169 1
#9 94 89 193 134 1
#10 110 15 27 31 1
#Warning message:
#In rbind(which(addme == 0), which(addme == 1)) :
# number of columns of result is not a multiple of vector length (arg 1)
Here's another way using dplyr, which would make it suitable for within-group ordering. It's also probably pretty quick. If there's unbalanced numbers of 0's and 1's, it will leave them at the end.
library(dplyr)
df %>%
arrange(addme) %>%
mutate(n0 = sum(addme == 0),
orderme = seq_along(addme) - (n0 * addme) + (0.5 * addme)) %>%
arrange(orderme) %>%
select(-n0, -orderme)