Transposing column on data frame from multiple different tables - r

I'm working with R Studio Version 1.0.143.
I need to transpose one column to become the name of the variables without losing information about the value (freq_var and freq_mut in this example).
Also, I need to know for which table the data came from (sample 1, 2 etc)
The main problem is how I add everything together even if one value in Gene is not present in Sample 1 but IS present in Sample 2 (NA's in the example)
I could do it manually, but my table contains thousands of values for each variable!
Sample 1
Freq_var Freq_mut Gene
2 2 A
3 3 B
2 5 C
Sample 2
Freq_var Freq_mut Gene
1 2 A
1 1 B
1 1 D
To:
A(Freq_var) B(Freq_var) C(Freq_var) D(Freq_var) A(Freq_mut).....
Sample 1 2 3 2 NA 2
Sample 2 1 1 NA 1 2

Related

if i want to sort a column by size in rstudio, how do i make sure that the associated values of the rows sort with the column?

I have a data.frame with 1200 rows and 5 columns, where each row contains 5 values of one person. now i need to sort one column by size but I want the remaining columns to sort with the column, so that one column is sorted by increasing values and the other columns contain the values of the right persons. ( So that one row still contains data from one and the same person)
colnames(BAPlotDET) = c("fsskiddet", "fspiddet","avg", "diff","absdiff")
these are the column names of my data.frame and I wanna sort it by the column called "avg"
First of all, please always provide us with a reproducible example such as below. The sorting of a data frame by default sorts all columns.
vector <- 1:3
BAPlotDET <- data.frame(vector, vector, vector, vector, vector)
colnames(BAPlotDET) = c("fsskiddet", "fspiddet","avg", "diff","absdiff")
fsskiddet fspiddet avg diff absdiff
1 1 1 1 1 1
2 2 2 2 2 2
3 3 3 3 3 3
BAPlotDET <- BAPlotDET[order(-BAPlotDET$avg),]
> BAPlotDET
fsskiddet fspiddet avg diff absdiff
3 3 3 3 3 3
2 2 2 2 2 2
1 1 1 1 1 1

Compare lists in dataframes based on personal code, shorten one lists if longer

I have two separate dataframes each for one speaker of an interacting dyad. They have different amounts of talk-turns (rows) which is why I keep them in separate files for now.
In order to run my final analyses I need identical number of rows for each speaker.
So what I want to do is compare dyad_id 1 in both data frames and then shorten the longer list for one by deleting the last row for all columns.
I prepared a data frame to illustrate what I already have.
So far, I tried to split the data frame by the dyad_id in both data sets to now compare the splits one after another and delete the unnecessary rows. As I have various conversations, I need to automate this to go through all dyad_ids one after another.
I hope someone can help me, I am completely lost.
dyad_id_A <- c(1,1,1,2,2,2,2,3,3,3,3,3)
fw_quantiles_a <- c(4,3,1,2,3,2,4,1,4,5,6,7)
df_A<- data.frame(dyad_id_A,fw_quantiles_a)
dyad_id_B <- c(1,1,1,1,2,2,2,3,3,3,3)
fw_quantiles_b <- c(3,1,2,1,2,4,1,3,3,4,5)
df_B <- data.frame(dyad_id_B,fw_quantiles_b)
example for final dataset
dyad_id_AB <- c(1,1,1,2,2,2,3,3,3,3)
What I tried so far:
split_conv_A = split(df_A, list(df_A$dyad_id_A))
split_conv_B = split(df_B, list(df_B$dyad_id_B))
Add a time counter within each dyad_id_x group and then merge together:
df_A$time <- ave(df_A$dyad_id_A, df_A$dyad_id_A, FUN=seq_along)
df_B$time <- ave(df_B$dyad_id_B, df_B$dyad_id_B, FUN=seq_along)
merge(
df_A, df_B,
by.x=c("dyad_id_A","time"), by.y=c("dyad_id_B","time")
)
# dyad_id_A time fw_quantiles_a fw_quantiles_b
#1 1 1 4 3
#2 1 2 3 1
#3 1 3 1 2
#4 2 1 2 2
#5 2 2 3 4
#6 2 3 2 1
#7 3 1 1 3
#8 3 2 4 3
#9 3 3 5 4
#10 3 4 6 5
Maybe we can try using table to calculate frequncies of id's in both the dataframe assuming you have the same id's in both the dataframe. Calculate the minimum between them using pmin and repeat the names based on the frequency.
tab <- pmin(table(df_A$dyad_id_A), table(df_B$dyad_id_B))
as.integer(rep(names(tab), tab))
# [1] 1 1 1 2 2 2 3 3 3 3

Go through a column and collect a running total in new column [duplicate]

This question already has answers here:
Creation of a specific vector without loop or recursion in R
(2 answers)
Split data.frame by value
(2 answers)
Closed 4 years ago.
I have a dataframe whose rows represent people. For a given family, the first row has the value 1 in the column A, and all following rows contain members of the same family until another row in in column A has the value 1. Then, a new family starts.
I would like to assign IDs to all families in my dataset. In other words, I would like to take:
A
1
2
3
1
3
3
1
4
And turn it into:
A family_id
1 1
2 1
3 1
1 2
3 2
3 2
1 3
4 3
I'm playing with a dataframe of 3 million rows, so a simple for-loop solution I came up with falls short of necessary efficiency. Also, the family_id need not be sequential.
I'll take a dplyr solution.
data:
df <- data.frame(A = c(1:3,1,3,3,1,4))
code:
df$familiy_id <- cumsum(c(-1,diff(df$A)) < 0)
result:
# A familiy_id
#1 1 1
#2 2 1
#3 3 1
#4 1 2
#5 3 2
#6 3 2
#7 1 3
#8 4 3
please note:
This solution starts a new group when a number occurs that is smaller than the previous one.
When its 100% sure that a new group always begins with a 1 consistently, then ronak's solution is perfect.

Pulling Specific Row Values based on Another Column

Simple question here -
if I have a dataframe such as:
> dat
typeID ID modelOption
1 2 1 good
2 2 2 avg
3 2 3 bad
4 2 4 marginCost
5 1 5 year1Premium
6 1 6 good
7 1 7 avg
8 1 8 bad
and I wanted to pull only the modelOption values based on the typeID. I know I can subset out all rows corresponding with the typeID, but I just want to pull the modelOption values in this case.

Using Merge with an R By class object

So I have a "by" class object (which is essentially a list).
It is indexed by 2 factors [id1,id2], with a list associated with each unique pair.
e.g.
id1:1
id2:1
1,2,3
------
id1:1
id2:2
4,4,NA
------
id1:2
id2:1
NA
I would like to convert this to a data frame which has 3 columns {id1,id2,value} and would take the above and return
id1, id2, value
1 1 1
1 1 2
1 1 3
1 2 4
1 2 4
1 2 NA
2 1 NA
This can be done with a for loop but is obviously slow. I am looking to try and merge the value column back to a data frame which has indices 1 and 2.
Answer: Use the data.table package. It is ridiculously quick for these sorts of problems.

Resources