I have one vector
>a<-c(4,5,6,7,8)
I have one data.frame
>df<-data.frame(start=c(1,4),end=c(3,5))
I want to create a third column in this df based on the start-end
>df
start end
1 1 3 mean(a[1:3])
2 4 5 mean(a[4:5])
of course mean(a[df$start:df$end]) does not work.
I have solved this in a long manner by creating a new data.frame, but I am wondering if is there a short way to do.
We can use mapply to get the seq of corresponding elements of 'start' and 'end' column, subset the 'a' based on that index, get the mean and assign the output to create the new column ('Mean') in 'df'
df$Mean <- mapply(function(x,y) mean(a[seq(x,y)]), df$start, df$end)
Related
My first question here...
I have 2 dataframes, both with a different number of rows.
The first one has 3 columns, the second one has 1 column.
I want to make all combinations of values from the 1st column of the 1st dataframe with values in the 1st (and only) column of the second dataframe, and values of 2nd column of 1st dataframe with values in 1st (and only) column of second dataframe, and so on...
I assume the result will be a one-column dataframe (?).
Something like this:
Attempts with combn did not help me yet...
Thanks!
Probably not fully what you want, but provides a starting point. Providing your first dataframe is called df and the other one (with one column) df2
#make data long using tidyr
df_long <- tidyr::pivot_longer(df, cols = c("loc1", "loc2", "loc3"))
#cartesian join with codes column
CJ(df_long$value, df2)
I have a dataframe with 3 columns. First two columns are IDs (ID1 and ID2) referring to the same item and the third column is a count of how many times items with these two IDs appear. The dataframe has many rows so I want to use binary search to first find the appropriate row where both IDs match and then add 1 to the cell under the count column in that row.
I have used the which() function to find the index of the correct row and then using the index added 1 to the count column.
For example:
index <- which(DF$ID1 == x & DF$ID1 == y)
DF$Count[index] <- DF$Count[index] + 1
While this works, the which function is very inefficient. Because I have to do this within a for loop for more than a trillion times, it takes a lot of time. Also, there is only one row in the data frame with this ID combination. While the which function goes through all the rows, a function that stops once it finds the correct row should suffice. I have looked into using data.table and setkey for this purpose but do not know how to implement that for my purpose. Thank you in advance.
Indeed you can use data.table and setkeyv (not setkey because you need 2 columns as indexes)
library(data.table)
DF <- data.frame(ID1=sample(1:100,100000,replace=TRUE),ID2=sample(1:100,100000,replace=TRUE))
# convert DF to a data.table
DF <- as.data.table(DF)
# put both ID1 and ID2 as indexes, in that order
setkeyv(DF,c("ID1","ID2"))
# random x and y values
x <- 10
y <- 18
# select value for ID1=x and ID2=y and add 1 in the Count column
DF[.(x,y),"Count"] <- DF[,.(x,y),"Count"]+1
I am trying to subset a data frame by taking the integer values of 2 columns om my data frame
Subs1<-subset(DATA,DATA[,2][!is.na(DATA[,2])] & DATA[,3][!is.na(DATA[,3])])
but it gives me an error : longer object length is not a multiple of shorter object length.
How can I construct a subset which is composed of NON NA values of column 2 AND column 3?
Thanks a lot?
Try this:
Subs1<-subset(DATA, (!is.na(DATA[,2])) & (!is.na(DATA[,3])))
The second parameter of subset is a logical vector with same length of nrow(DATA), indicating whether to keep the corresponding row.
The na.omit functions can be an answer to you question
Subs1 <- na.omit(DATA[2:3])
[https://stat.ethz.ch/R-manual/R-patched/library/stats/html/na.fail.html]
Here an example.
a,b ,c are 3 vectors which a and b have a missing value.
once they are created i use cbind in order to bind them in one matrix which afterwards you can transform to data frame.
The final result is a dataframe where 2 out of 3 columns have a missing value.
So we need to keep only the rows with complete cases.DATA[complete.cases(DATA), ] is used in order to keep only these rows that have not missing values in every column. subset object is these rows that have complete cases.
a <- c(1,NA,2)
b <- c(NA,1,2)
c <- c(1,2,3)
DATA <- as.data.frame(cbind(a,b,c))
subset <- DATA[complete.cases(DATA), ]
I know how to add a column that is, say, the sum of two other columns, but I'm looking for a way to make a new column that equals the sum of a subset of rows in another column.
For example, I have a table, "table.1" and the third column "table.1[3]" consists of numbers. I want to add a fourth column such that the 1st row of column 4 = the sum of the values in column 3 from row 1 to 100; the 2nd row = sum of column 3 from row 2 to 101, and so on.
Essentially, at row x, I want table.1[x, 4]=sum(table.1[x:x+99, 3])
Anyone know how I can add a column like that? Thanks.
One way would be to use the embed() command. In this example your "column 3" is named "a" in the data.frame and i'm adding a column named "b" made up such that dd$b[x]<-sum(dd$a[x:(x+4-1)]) so i'm just using a distance of N=4 rather than N=100 for simplicity.
dd<-data.frame(a=c(1,3,6,4,2,4,6,7,8,1))
N<-4
dd$b<-rowSums(embed(c(dd$a, rep.int(0,N-1)), N))
Note that I padded the end of the dd$a vector so that when the range "goes off the end", I assume those values to be 0.
I have a data frame (df) with 8 columns and 1200 rows. Among those 8 columns I want to find the minimum value of column 7 and find the corresponding value of column 2 in that particular row where the minimum value of column 7 was found. Also column 2 holds characters so I want a character vector giving me its value.
I found the minimum of column 7 using
min_val <- min(as.numeric(df[, 7]), na.rm = TRUE)
Now how do I get the value from column 2 (variable name of column being 'column.2') corresponding to the row in which column 7 contains value of 'min_val' as calculated above?
This might be a trivial question but I am new to R so any help will be much appreciated.
Use which.min to get the minimum value index. Something like :
df[which.min(df[,7]),2]
Note that which.min only returns the first index of the minimum, so if you've got several rows with the same minimal value, you will only get the first one.
If you want to get all the minimum rows, you can use :
df[which(df[,7]==min(df[,7])), 2]
The same answer from juba, but using data.table package (his answer uses just the R base, without the need of loading any libraries).
# Load data.table
library(data.table)
# Get 2nd column's value correspondent to the first minimum value in 7th column
df[which.min(V7), V2]
# Get all respective values in 2nd column correspondent to the minimum value in 7th column
df[V2 == min(V7), V2]
For handling data.frame-like objects, data.table is quite handly and helpful, just like the dplyr package. It's worth to look at them.
Here I've assumed your colnames were named as V1..V8. Otherwise, just replace the V7/V2 with the respective column names in 7th and 2nd position of your data, respectively.