I am trying to create a subset from an existing data frame where the variable "Readings" displays values which are greater than the previous reading, as well as the corresponding row entry for the "Time" variable.
The code I have written below only produces "NA" entries.
Data$Readings<-0
for (i in 1:nrow(Data)){
Pos.Readings<-Data[Data$Readings[i+1]>Data$Readings[i],]
}
Pos.Readings
I would like the new data frame to display the row entries for i and i+1 if i+1>i in the Readings variable.
Here is an example of the data
Time Readings
12:00:00 0.1
12:00:01 0.3
12:00:02 0.45
12:00:03 0.2
12:00:04 0.02
12:00:05 -0.7
12:00:06 -0.25
12:00:07 0.27
So, what I am aiming for should look like:
Time Readings
12:00:00 0.1
12:00:01 0.3
12:00:02 0.45
12:00:05 -0.7
12:00:06 -0.25
12:00:07 0.27
I have probably gone about writing the for loop incorrectly, but I hope my intentions are clear to all.
It looks like you care about the absolute value of readings being greater than the previous. If that is the case, try this:
comparisons <- Data$Readings[-nrow(Data)]
Data$prevReading <- 0 #or just a really small number that automatically keeps row 1
Data$prevReading[-1] <- comparisons
subsetData <- subset(Data, abs(prevReading) < abs(Readings))
subsetData <- subsetData[c("Time", "Readings")]
If you wanted the actual readings being compared and not the absolute values, just get rid of the two abs() commands when you subset.
Related
In my data set in R, respondents were exposed to a stimuli, and their reactions were studied at baseline, one hour after exposure, and two hours after. In R, I adjusted the data by baseline. Here is an example of what my data looks like:
stimuli_no base hour two_hour
1 0 0.02 -0.10
2 0 0.01 -0.03
3 0 -0.01 0.02
1 0 -0.05 -0.06
2 0 0.03 0.05
3 0 0.02 0.04
First thing I want to is get the mean of each time interval by the stimuli_no, which I did with this code:
transform(df, m_base = ave(base, stimuli_no), m_hour = ave(hour, stimuli_no), m_twoh = ave(two_hour, stimuli_no))
Now I want to make a line graph that has the time intervals of baseline, hour, and two hour on the x axis, and the scores on the y axis, with separate lines for each of the stimuli. Here is an example:
Is there a simple way to do this in R with my data as is, or do I need to restructure my data? If I need to restructure, how would I go about that?
Look into pivot_longer(), that has the functionality you're looking for. In long format you can also use the group_by() command to get means for each time interval. Both of these are tidyverse functions.
I have a dataset looks like this
time value
0.1 2
0.1 3
0.2 2
0.2 4
0.2 1
I also have a threshold value threshold = 2.5
I want to combine the rows with the same time point and calculate the ratio above the threshold and return a table like this:
time value
0.1 0.50
0.2 0.33
I have tried to use data.table to combine all the rows with same time points but I am having some trouble to come up a function to calculate ratio at each time point. I know that I could write some for loop but I am wondering if there's a more R way to solve this problem.
My thought is something like this:
results <- dt[,some_function(value, threshold),by=time]
Thank you in advance.
So I have a dataframe, PVALUES, like this:
PVALS <- read.csv(textConnection("PVAL1 PVAL2 PVAL3
0.1 0.04 0.02
0.9 0.001 0.98
0.03 0.02 0.01"),sep = " ")
That corresponds to another dataframe, DATA, like this:
DATA <- read.csv(textConnection("COL1 COL2 CO3
10 2 9
11 20 200
2 3 5"),sep=" ")
For every row in DATA, I'd like to take the mean of the numbers whose indices correspond to entries in PVALUES that are <= 0.05.
So, for example, the first row in PVALUES only has two entries <= 0.05, the entries in [1,2] and [1,3]. Therefore, for the first row of DATA, I want to take the mean of 2 and 9.
In the second row of PVALUES, only the entry [2,2] is <=0.05, so instead of taking the mean for the second row of DATA, I would just use DATA[20,20].
So, my output would look like:
MEANS
6.5
20
3.33
I thought I might be able to generate indices for every entry in PVALUES <=0.05, and then use that to select entries in DATA to use for the mean. I tried to use this command to generate indices:
exp <- which(PVALUES[,]<=0.05, arr.ind=TRUE)
...but it only picks up on indices for entries the first column that are <=0.05. In my example above, it would only output [3,1].
Can anyone see what I'm doing wrong, or have ideas on how to tackle this problem?
Thank you!
It's a bit funny looking, but this should work
rowMeans(`is.na<-`(DATA,PVALUES>=.05), na.rm=T)
The "ugly" part is calling is.na<- without doing the automatic replacement, but here we just set all data with p-values larger than .05 to missing and then take the row means.
It's unclear to me exactly what you were doing with exp, but that type of method could work as well. Maybe with
expx <- which(PVALUES[,]<=0.05, arr.ind=TRUE)
aggregate(val~row, cbind(expx,val=DATA[exp]), mean)
(renamed so as not to interfere with the built in exp() function)
Tested with
PVALUES<-read.table(text="PVAL1 PVAL2 PVAL3
0.1 0.04 0.02
0.9 0.001 0.98
0.03 0.02 0.01", header=T)
DATA<-read.table(text="COL1 COL2 CO3
10 2 9
11 20 200
2 3 5", header=T)
I usually enjoy MrFlick's responses but the use of is.na<- in that manner seems to violate my expectations of R code because it is destructively modifies the data. I admit that I probably should have been expecting that possibility because of assignment arrow but it surprised me nonetheless. (I don't object to data.table code because it is hones t and forthright about modifying its contents with the := function.) I also admit that my efforts to improve one it lead me down a rabbit hole where I found this equally "baroke" effort. (You have incorrectly averaged 2 and 9)
sapply( split( DATA[which( PVALS <= 0.05, arr.ind=TRUE)],
which( PVALS <= 0.05, arr.ind=TRUE)[,'row']),
mean)
1 2 3
5.500000 20.000000 3.333333
I have a small data.table representing one record per test cell (AB testing results) and am wanting to add several more columns that compare each test cell, against each other test cell. In other words, the number of columns I want to add, will depend upon how many test cells are in the AB test in question.
My data.table looks like:
Group Delta SD.diff
Control 0 0
Cell1 0.00200 0.001096139
Cell2 0.00196 0.001095797
Cell3 0.00210 0.001096992
Cell4 0.00160 0.001092716
And I want to add the following columns (numbers are trash here):
Group v.Cell1 v.Cell2 v.Cell3 v.Cell4
Control 0.45 0.41 0.45 0.41
Cell1 0.50 0.58 0.48 0.66
Cell2 0.58 0.50 0.58 0.48
Cell3 0.48 0.58 0.50 0.70
Cell4 0.66 0.48 0.70 0.50
I am sure that do.call is the way to go, but I cant work out how to embed one do.call inside another to generate the script... and I can't work out how to then execute the scripts (20 lines in total). The closest I am currently is:
a <- do.call("paste",c("test.1.results <- mutate(test.1.results, P.Better.",list(unlist(test.1.results[,Group]))," = pnorm(Delta, test.1.results['",list(unlist(test.1.results[,Group])),"'][,Delta], SD.diff,lower.tail=TRUE))", sep=""))
Which produces 5 script lines like:
test.1.results <- mutate(test.1.results, P.Better.Cell2 = pnorm(Delta, test.1.results['Cell2'][,Delta], SD.diff,lower.tail=TRUE))
Which only compares one test cell results against itself.. a 0.50 result (difference due to chance). No use what so ever as I need each test compared to each other.
Not sure where to go with this one.
Update: In v1.8.11, FR #2077 is now implemented - set() can now add columns by reference, . From NEWS:
set() is able to add new columns by reference now. For example, set(DT, i=3:5, j="bla", 5L) is equivalent to DT[3:5, bla := 5L]. This was FR #2077. Tests added.
Tasks like this are often easier with set(). To demonstrate, here's a translation of what you have in the question (untested). But I realise you want something different than what you've posted (which I don't quite understand, quickly).
for (i in paste0("Cell",1:4))
set(DT, # the data.table to update/add column by reference
i=NULL, # no row subset, NULL is default anyway
j=paste("P.Better.",i), # column name or position. must be name when adding
value = pnorm(DT$Delta, DT[i][,Delta], DT$SD.diff, lower.tail=TRUE)
Note that you can add only a subset of a new column and the rest will be filled with NA. Both with := and set.
I don't think this question has asked yet (most similar questions are about extracting data or returning a count). I am new to R, so any help would be appreciated!
I have a dataset of multiple runs of an experiment in one file and the data looks like this, where i have all the time steps for each run in rows
time [info] id (unique per run)
I am attempting to calculate when the system reaches equilibrium, which I am defining as stable values in 3 interdependent parameters. I would like to have the contents of rows compared and if they are within 5% of each other over 20 timesteps, to return the timestep at which the stability begins and the id.
So far, I'm thinking it will be something like the following (or maybe have a while loop)(sorry for the bad formatting):
y=1;
z=0; #variables to control the loop
x=0;
for (ID) {
if (CC at time=x == 0.05+-CC at time=y ) {
if(z<=20){ #catalogs the number of periods that match
y++
z++}
else [save value in column]
}
else{ #no match for sustained period so start over again
x++
y=x+1
z=0
}
}
eta: CC is one of my parameters of interest and ranges between 0 and 1 although the endpoints are unlikely.
Here's a simple example that might help: this is something like how my data looks:
zz <- textConnection("time CC ID
1 0.99 1
2 0.80 1
3 0.90 1
4 0.91 1
5 0.92 1
6 0.91 1
1 0.99 2
2 0.90 2
3 0.90 2
4 0.91 2
5 0.92 2
6 0.91 2")
Data <- read.table(zz, header = TRUE)
close(zz)
my question is, how can i run through the lines to find out when the value of CC becomes 'stable' (meaning it doesn't change by more than 0.05 over X (here, 3) time steps) so that it would create the following results:
ID timeToEQ
1 1 3
2 2 2
does this help? The only way I can think to do this is with a for-loop and I think there must be an easier way!
Here is my code. I will post the explanation in some time.
require(plyr)
ddply(Data, .(ID), summarize, timeToEQ = Position(isTRUE, abs(diff(CC)) < 0.05 ))
ID timeToEQ
1 1 3
2 2 2
EDIT. Here is how it works.
ddply breaks Data into subsets based on ID.
diff(CC) computes the difference between CC of successive rows.
abs(diff(CC)) < 0.05) returns TRUE if the difference has stabilized.
Position locates the first instance of an element which satisfies isTRUE.