R Pooled DataFrame analysis - r

I'm trying to perform several analysis on subsets of data in a dataframe in R, and i was wondering if there is generic way for doing this.
Say, I have a dataframe like:
one two three four
[1,] 1 6 11 16
[2,] 2 7 12 17
[3,] 3 8 11 18
[4,] 4 9 11 19
[5,] 5 10 15 20
how could I apply some computation (e.g. cumulative counting) based upon values in col "one" condition upon (grouped by) the value in col "three".
That is, I wanna do stuff to one column, based upon grouping in another column. I can do this with loops, but I feel there might be standard ways to do this all at once.
thank you in advance!

ddply(data, .(coln), Stat) does the trick exactly

Related

Create dataframe with missing data

I'm very new to R, so please excuse my potentially noob question.
I have data from 23 individuals of hormone concentrations collected hourly - I've interpolated between hourly collections to get concentrations between 2.0 - 15pg/ml at intervals of 0.1 : this equals to 131 rows of data per individual.
Some individials' concentrations, however, don't go beyond 6.0 pg/ml (for example) which means I have dataframes of unequal number of rows across individials. I need all individuals to have 131 rows for the next step where I combine all the data.
I've tried to create a dataframe of NAs with 131 rows and two coloumns, and then add the individual's interplotated data into the NA dataframe - so that the end result is a 131 row data from with missing data as NA - but it's not going so well.
interp_saliva_002_x <- as.tibble(matrix(, nrow = 131, ncol = 1))
interp_sequence <- as.numeric(seq(2,15,.1))
interp_saliva_002_x[1] <- interp_sequence
colnames(interp_saliva_002_x)[1] <- "saliva_conc"
test <- left_join(interp_saliva_002_x, interp_saliva_002, by "saliva_conc")
Can you help me to understand where I'm going wrong or is there a more logical way to do this?
Thank you!
Lets assume you have 3 vectors with different lengths:
A<-seq(1,5); B<-seq(2,8); C<-seq(3,5)
Change the length of the vectors to the length that you want (in your case it's 131, I picked 7 for simplicity):
length(A)<-7; length(B)<-7; length(C)<-7 #this replaces all the missing values to NA
Next you can cbind the vectors to a matrix:
m <-cbind(A,B,C)
# A B C
#[1,] 1 2 3
#[2,] 2 3 4
#[3,] 3 4 5
#[4,] 4 5 NA
#[5,] 5 6 NA
#[6,] NA 7 NA
#[7,] NA 8 NA
You can also change your matrix to a dataframe:
df<-as.data.frame(m)

Getting a difference between time(n+1)-time(n) in a dataframe in r

I have a dataframe where the columns represent monthly data and the rows different simulations. the data I am working with accumulates over time so I want to take the difference between the months to get the true value for that month. There are not headers for my data frame
For example:
View(df)=
1 3 4 6 19 23 24 25 26 ...
1 2 3 4 5 6 7 8 9 ...
0 0 2 3 5 7 14 14 14 ...
My plan was to use the diff() function or something like it, but I am having trouble using it on a dataframe.
I have tried:
df1<-diff(df, lag = 1, differences = 1)
but only get zeros.
I am grateful for any advice.
see ?apply. If it's a data frame
apply(df,2,diff)
should work. Also since a dataframe is a list of vectors sapply(df,diff) should work.

Finding the index of the minimum value which is larger than a threshold in R

This is probably very simple, but I'm missing the correct syntax in order to simplify it.
Given a matrix, find the entry in one column which is the lowest value, greater than some input parameter. Then, return an entry in a different column on that corresponding row. Not very complicated... and I've found something that works but, a more efficient solution would be greatly appreciated.
I found this link:Better way to find a minimum value that fits a condition?
which is great.. but that method of finding the least entry loses the index information required to find a corresponding value in a corresponding row.
Let's say column 2 is the condition column, and column 1 is the one I want to return.... currently I've made this: (note that this only works because row two is full of numbers which are less than 1).
matrix[which.max((matrix[,2]>threshhold)/matrix[,2]),1]
Any thoughts? I'm expecting that there is probably some quick and easy function which has this effect... it's just never been introduced to me haha.
rmk's answer shows the basic way to get a lot of info out of your matrix. But if you know which column you're testing for the minimum value (above your threshold), and then want to return a different value in that row, maybe something like
incol<- df[,4] # select the column to search
outcol <- 2 # select the element of the found row you want to get
threshold <- 5
df[ rev(order(incol>threshold))[1] ,outcol]
You could try the following. Say,
df <- matrix(sample(1:35,35),7,5)
> df
[,1] [,2] [,3] [,4] [,5]
[1,] 18 16 27 19 31
[2,] 24 1 7 12 5
[3,] 28 35 23 4 6
[4,] 33 3 25 26 15
[5,] 14 10 11 21 20
[6,] 9 2 32 17 13
[7,] 30 8 29 22 34
Say your threshold is 5:
apply(df,2,function(x){ x[x<5] <- max(x);which.min(x)})
[1] 6 7 2 2 2
Corresponding to the values:
[1] 9 8 7 12 5
This should give you the index of the smallest entry in each column greater than threshold according to the original column indexing.

Re-sample a data frame with panel dimension

I have a data set consisting of 2000 individuals. For each individual, i:2000 , the data set contains n repeated situations. Letting d denote this data set, each row of dis indexed by i and n. Among other variables, d has a variable pid which takes on identical value for an individual across different (situations) rows.
Taking into consideration the panel nature of the data, I want to re-sample d (as in bootstrap):
with replacement,
store each re-sample data as a data frame
I considered using the sample function but could not make it work. I am a new user of r and have no programming skills.
The data set consists of many variables, but all the variables have numeric values. The data set is as follows.
pid x y z
1 10 2 -5
1 12 3 -4.5
1 14 4 -4
1 16 5 -3.5
1 18 6 -3
1 20 7 -2.5
2 22 8 -2
2 24 9 -1.5
2 26 10 -1
2 28 11 -0.5
2 30 12 0
2 32 13 0.5
The first six rows are for the first person, for which pid=1, and the next sex rows, pid=2 are different observations for the second person.
This should work for you:
z <- replicate(100,
d[d$pid %in% sample(unique(d$pid), 2000, replace=TRUE),],
simplify = FALSE)
The result z will be a list of dataframes you can do whatever with.
EDIT: this is a little wordy, but will deal with duplicated rows. replicate has its obvious use of performing a set operation a given number of times (in the example below, 4). I then sample the unique values of pid (in this case 3 of those values, with replacement) and extract the rows of d corresponding to each sampled value. The combination of a do.call to rbind and lapply deal with the duplicates that are not handled well by the above code. Thus, instead of generating dataframes with potentially different lengths, this code generates a dataframe for each sampled pid and then uses do.call("rbind",...) to stick them back together within each iteration of replicate.
z <- replicate(4, do.call("rbind", lapply(sample(unique(d$pid),3,replace=TRUE),
function(x) d[d$pid==x,])),
simplify=FALSE)

Perform 'cross product' of two vectors, but with addition

I am trying to use R to perform an operation (ideally with similarly displayed output) such as
> x<-1:6
> y<-1:6
> x%o%y
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 1 2 3 4 5 6
[2,] 2 4 6 8 10 12
[3,] 3 6 9 12 15 18
[4,] 4 8 12 16 20 24
[5,] 5 10 15 20 25 30
[6,] 6 12 18 24 30 36
where each entry is found through addition not multiplication.
I would also be interested in creating the 36 ordered pairs (1,1) , (1,2), etc...
Furthermore, I want to use another vector like
z<-1:4
to create all the ordered triplets possible between x, y, and z.
I am using R to look into likelihoods of possible total when rolling dice with varied numbers of sizes.
Thank you for all your help! This site has been a big help to me. I appreciate anyone that takes the time to answer a stranger's question.
UPDATE So I found that `outer(x,y,'+') will do what I wanted first. But I still don't know how to create ordered pairs or ordered triplets.
Your first question is easily handled by outer:
outer(1:6,1:6,"+")
For the others, I suggest you try expand.grid, although there are specialized combination and permutation functions out there as well if you do a little searching.
expand.grid can answer your second question:
expand.grid(1:6,1:6)
expand.grid(1:6,1:6,1:4)

Resources