I'm trying to get the ST of the last 20 values for each row in a data.frame. The procedure would be something like this in excel, but im trying to do it in r and with dplyr.
enter image description here
Your question is thin and does not provide a data source. But here is an example of what you want. This is a dataframe with random numbers with 50 rows. There are two columns of numbers.
The last twenty rows (this is what you want) from that dataframe are selected and then these twenty rows are used within a loop. A math function is applied to each of these 20 rows.
set.seed(14)
n <- sample(100, 50)
b <- sample(200, 50)
cdf <- data.frame(n, b) # new data frame with random two columns of random numbers.
nrow(cdf)
last <- nrow(cdf) - 20 # Selecting the last 20 rows of data from the data frame
for (i in (last:nrow(cdf))) { # loop to apply math to last 20 rows of data.
print(mean(cdf$b[i]))
print(i)
}
Related
I have a .csv file of 39 variables and 713 rows, each containing a count of plastic items. I have another column which is the survey length, and I want to standardise each count of items by a survey length of 100. I am unsure how to create a loop to run through each row and cell individually to do this. Many also have NA values.
Any ideas would be great.
Thank you.
Consider applying formula directly on columns without need of looping:
# RETRIEVE ALL COLUMN NAMES (MINUS SURVEY LENGTH)
vars <- names(df)[!grepl("survey_length", names(df))]
# EXPAND SINGLE COLUMN TO EQUAL DIMENSION OF DATA FRAME
survey_length_mat <- matrix(df$survey_length, ncol=length(vars), nrow=nrow(df))
# APPLY FORMULA
df[vars] <- (df[vars] / survey_length_mat) * 100
df
I have a dataframe, for all columns of which I want to calculate paired ratios of rows (for example, row1/row2, row3/row4, row5/row6, etc.) and write the result of calculation to a new dataframe. I decided to wrap it in a function with 3 arguments:
paired_row_rat=function(dataframe,rows,columns){
ratio_df=data.frame(matrix(nrow=rows/2,ncol=columns)) #creates new dataframe
#where number of columns is the same as in dataframe used for
#calculation, number of rows for paired ratios will be 2 times lower
cln=colnames(dataframe) #names of columns should be equal in both
colnames(ratio_df)=cln #dataframes
i=seq(1,rows,by=2) #sequance for choosing the first row of calculation
j=i+1 #for choosing second row of calculation
for (k in 1:nrow(ratio_df)){ #here as I am trying to fill new
ratio_df[k,]=dataframe[i,]/dataframe[j,] #dataframe with ratios,
} #the error appears
return(ratio_df)
}
pmap(list(tula3,24,98),paired_row_rat)
#runs the function for my dataframe with 24 rows and 98 columns
In the resulting dataframe each column has the same values for all rows and I have warnings from R:
warnings()
Warning messages:
1: In [<-.data.frame(*tmp*, k, , value = structure(list( ... :
replacement element 1 has 12 rows to replace 1 rows
I've searched a lot for possible solutions but still can't fix this problem. Something is wrong with the for loop. But I don't uderstand where the problem is.
datafrfame used for calculation (the result of head(df)):
Assumed that the requirement is to calculate the ratio for pairs row1/row2, row3/row4 and so on....
Try this:
as.data.frame(t(sapply(seq(1,(nrow(df)-1),2),function(x,df){df[x,]/df[x+1,]},df)))
where df is your data.frame
I have a data matrix in R having 45 rows. Each row represents a value of a individual sample. I need to do to a trial simulation; I want to pair up samples randomly and calculate their differences. I want a large sampling (maybe 10000) from all the possible permutations and combinations.
This is how I managed to do it till now:-
My data matrix ("data") has 45 rows and 2 columns. I selected 45 rows randomly and subtracted from another randomly generated 45 rows.
n1<-(data[sample(nrow(data),size=45,replace=F),])-(data[sample(nrow(data),size=45,replace=F),])
This gave me a random set of differences of size 45.
I made 50 such vectors (n1 to n50) and did rbind, which gave me a big data matrix containing random differences.
Of course, many rows between first random set and second random set were same and cancelled out. I removed it with a code as follows:
row_sub = apply(new, 1, function(row) all(row !=0 ))
new.remove.zero<-new[row_sub,]
BUT, is there a cleaner way to do this? A simpler way to generate all possible random pairs of rows, calculate their difference as bind it together as a new matrix?
Thanks in advance.
My Problem:
I have a dataframe consisting of 86016000 rows of observations:
there are 512000 observations for each hour
there are 24 hours data for seven days
So 24*7*512000 = 86016000
there are 40 columns (variables)
There is no column of date or datetimestamp
Only row numbers are good enough to identify how many obs. for each day, and there are no errors in recording of this data.
Given such a large dataset, what I want to do is create subsets of 12288000 (i.e. 24 * 512000) rows, so that we have 7 each day's subset.
What I tried:
d <- split(PltB_Fold3_1_Data, rep(1:12288000, each=7))
But unfortunately after almost half an hour, I termicated the process as there was no result.
Is there any better solution then the one above?
You're probably looking for seq rather than rep. With seq, you can generate a sequence of numbers from 0 to 86016000 incremented by 12288000.
To save resources, you can then use this sequence to generate temporary data frames and do whatever you want with each.
sequence <- seq(from = 0, to = 86016000, by = 12288000)
for(i in 1:(length(sequence)-1)){
temp <- df[sequence[i]+1:sequence[i+1], ]
# do something here with your temporary data frame
}
Sorry, really beginner question: I want to generate a data frame with random data. I want my data frame to be 10 rows by 20 columns, where each row contains data from a random sample generated by rnorm. How do I do this?
Producing a matrix may be easier, but this can be converted to a dataframe:
rownum <- 10
colnum <- 20
yourdf <- as.data.frame(matrix(rnorm(rownum * colnum), nrow=rownum))