Let's say I have a df with 100 rows. 25 of these rows match a specific criteria. I want to divide the total number of my df through my matching rows and add the value to a vector.
e.g. 100/25 = 25 ===> c(25)
x<-1:100
Criteria:
<=25
avector<-length(x)/sum(x<=25)
If you need to append to an existing vector use append.
I think I found a solution
df <- c(nrow(subset(data, var1 > 999))/(nrow(data)))
Related
I have a matrix with 2 columns and 100 rows. For example lets say that in the first column I have random numbers from 1 to 100. In the second column I have random numbers from 100 to 200. If I want to sort the 1st column from lower to higher, can I transfer at the same time all the numbers in the same row?
For example in row number 40, I have: ex[40,1] <- 1 and ex[40,2]<-150.
Sort the 1st column, and in the first row I want to have:
ex[1,1]<-1 ex[1,2]<-150
the dplyr package has a nice arrange function you could use like so:
ex <- data.frame(col1=c(2,1,3), col2=c("b","a","c")) #your data.frame
library(dplyr)
arrange(ex, col1) #sort decreasing
#OR
arrange(ex, -col1) #sort increasing
You can try order like below
ex[order(ex[,1]),]
I have a data frame with 2 columns and 26 rows, the first column is composed of characters while the second column is composed of numbers.
I also have a vector with a random selection of 5 characters.
I want to sum the numbers from column two of the 5 random characters.
How can I calculate this sum?
We can use aggregate
aggregate(ints ~ char, data1, sum)
Maybe what you need is :
result <- sum(data1$ints[data1$char %in% sample1], na.rm = TRUE)
This will sum the ints value in data1 which is present in sample1.
I have a dataframe with 3 columns. First two columns are IDs (ID1 and ID2) referring to the same item and the third column is a count of how many times items with these two IDs appear. The dataframe has many rows so I want to use binary search to first find the appropriate row where both IDs match and then add 1 to the cell under the count column in that row.
I have used the which() function to find the index of the correct row and then using the index added 1 to the count column.
For example:
index <- which(DF$ID1 == x & DF$ID1 == y)
DF$Count[index] <- DF$Count[index] + 1
While this works, the which function is very inefficient. Because I have to do this within a for loop for more than a trillion times, it takes a lot of time. Also, there is only one row in the data frame with this ID combination. While the which function goes through all the rows, a function that stops once it finds the correct row should suffice. I have looked into using data.table and setkey for this purpose but do not know how to implement that for my purpose. Thank you in advance.
Indeed you can use data.table and setkeyv (not setkey because you need 2 columns as indexes)
library(data.table)
DF <- data.frame(ID1=sample(1:100,100000,replace=TRUE),ID2=sample(1:100,100000,replace=TRUE))
# convert DF to a data.table
DF <- as.data.table(DF)
# put both ID1 and ID2 as indexes, in that order
setkeyv(DF,c("ID1","ID2"))
# random x and y values
x <- 10
y <- 18
# select value for ID1=x and ID2=y and add 1 in the Count column
DF[.(x,y),"Count"] <- DF[,.(x,y),"Count"]+1
I am trying to find the corresponding element in the column ID based on a combined condition if the column GROUP equals to 2 and the maximum value occurr in the column OBS.
How to do this task in R?
Here is my dataset:
ID <- as.factor(c("A","B","C","D","E","F"))
OBS <- c(1,3,2,8,3,10)
GROUP <- as.factor(c(1,1,1,2,2,2))
df <- data.frame(ID,OBS,GROUP)
Thanks a lot.
Assuming that you mean that you want to first subset the data frame by the condition that GROUP should be equal to 2, and then identify the ID for which the value of OBS is highest, this should do the trick:
df2 <- df[df$GROUP==2,]
df2$ID[df2$OBS==max(df2$OBS)]
I have a dataset that has 5 columns and 50 rows. I want to divide it into two parts, one with 35 rows and 15 in the other randomly. Then i would like to add another column to this dataset which contains value TRUE/FALSE. TRUE if the row belongs to the 35 randomly selected rows and FALSE if it belongs to the 15. How do i achieve it in R...
All help is greatly appreciated..
Thanks
We create a vector of 'TRUE/FALSE' elements using rep by specifying the times to replicate the 'TRUE/FALSE' values, sample it, and create a new column ('ind') by assigning the output. Then, split the dataset into a list of 2 data.frames by 'ind' column.
df1$ind <- sample(rep(c(TRUE, FALSE), times = c(35, 15)))
split(df1, df1$ind)
data
set.seed(24)
df1 <- as.data.frame(matrix(sample(9, 50*5, replace=TRUE), ncol=5))