How to find the rows with maximum number of variables columnwise r - r

I have a data frame that has thousands of rows and 25 columns. The rows have different lengths, meaning that not all have values for all columns. However, the empty cells are always at the end of the row:
1 1 1 1
1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1
How can I find the longest row? In the example above it would be row number 3.

If the missing values are NAs and all those values are at the end.
indx <- which.max(rowSums(!is.na(df1)))
Or we can use apply to loop over the rows
indx <- which.max(apply(df1, 1, function(x) length(x[!is.na(x)])))
If the missing values are ''
indx <- which.max(apply(df1, 1, function(x) length(x[x!=''])))

Related

if i want to sort a column by size in rstudio, how do i make sure that the associated values of the rows sort with the column?

I have a data.frame with 1200 rows and 5 columns, where each row contains 5 values of one person. now i need to sort one column by size but I want the remaining columns to sort with the column, so that one column is sorted by increasing values and the other columns contain the values of the right persons. ( So that one row still contains data from one and the same person)
colnames(BAPlotDET) = c("fsskiddet", "fspiddet","avg", "diff","absdiff")
these are the column names of my data.frame and I wanna sort it by the column called "avg"
First of all, please always provide us with a reproducible example such as below. The sorting of a data frame by default sorts all columns.
vector <- 1:3
BAPlotDET <- data.frame(vector, vector, vector, vector, vector)
colnames(BAPlotDET) = c("fsskiddet", "fspiddet","avg", "diff","absdiff")
fsskiddet fspiddet avg diff absdiff
1 1 1 1 1 1
2 2 2 2 2 2
3 3 3 3 3 3
BAPlotDET <- BAPlotDET[order(-BAPlotDET$avg),]
> BAPlotDET
fsskiddet fspiddet avg diff absdiff
3 3 3 3 3 3
2 2 2 2 2 2
1 1 1 1 1 1

Sum rows at specific column intervals

I have a large data frame of 1129 rows and 4662 columns. I want to sum the row values in a data frame at intervals of every 3 columns, and then return 1 for each of these sums if the row sum every 3 columns was >0, or return 0 if the sum<1. I have added a small reproducible example below. I would like to sum the row values of column 1 to column 3, and then the row values from column 4 to column 8 (and so on in my real data).
df <- read.table(text =" 2005-09-23_2005-09-26 2005-09-27_2005-10-30 2005-10-07_2005-10-08 2005-10-09_2005-10-10 2005-10-11_2005-10-12 2005-10-13_2005-10-14
1 1 0 1 1 1 1
2 1 1 0 0 0 0
3 NA NA NA NA NA 0", header = TRUE)
The result I am after would be this:
result <- read.table(text =" 2005-09-23_2005-10-08 2005-10-09_2005-10-14
1 1 1
2 1 0
3 NA 0", header = TRUE)
I looked for similar questions and it seems that rollapply (R: summing over an interval of rows) OR rowsum could work (R: summing over an interval of rows), but I can't find a way to sum rows using columns as intervals instead of rows, nor how to do it in a repetitive sequence. Would someone be so kind to help me with some code for doing this? Thank you very much!
This works only if the number of columns is divisible by the interval.
+(sapply(split.default(df,unlist(lapply(1:(ncol(df)/3),rep,3))),rowSums) > 0)
1 2
1 1 1
2 1 0
3 NA NA
maybe someone else can find a more elegant way of creating the split other than unlist(lapply(1:(ncol(df)/3),rep,3))

Subtract the value of two columns by 1 from a variable

How to subtract from a data.frame the value of two columns by 1
So far I couldn't find anything about how to address a column from a data.frame and subtract all values at one
myData:
src target
1 1
2 2
3 3
4 4
Should become:
src target
0 0
1 1
2 2
3 3
If we need to subtract the last two columns, get the last columns extracted with tail
nm1 <- tail(names(df1), 2)
df2 <- df1[nm1] -1

For loop to paste rows to create new dataframe from existing dataframe

New to SO, but can't figure out how to get this code to work. I have a dataframe that is very large, and is set up like this:
Number Year Type Amount
1 1 A 5
1 2 A 2
1 3 A 7
1 4 A 1
1 1 B 5
1 2 B 11
1 3 B 0
1 4 B 2
This goes onto multiple for multiple numbers. I want to take this dataframe and make a new dataframe that has two of the rows together, but it would be nested (for example, row 1 and row 2, row 1 and row 3, row 1 and row 4, row 2 and row 3, row 2 and row 4) where each combination of each year is together within types and numbers.
Example output:
Number Year Type Amount Number Year Type Amount
1 1 A 5 1 2 A 2
1 1 A 5 1 3 A 7
1 1 A 5 1 4 A 1
1 2 A 2 1 3 A 7
1 2 A 2 1 4 A 1
1 3 A 7 1 4 A 1
I thought that I would do a for loop to loop within number and type, but I do not know how to make the rows paste from there, or how to ensure that I am only getting the combinations of the rows once. For example:
for(i in 1:n_number){
for(j in 1:n_type){
....}}
Any tips would be appreciated! I am relatively new to coding, so I don't know if I should be using a for loop at all. Thank you!
df <- data.frame(Number= rep(1,8),
Year = rep(c(1:4),2),
Type = rep(c('A','B'),each=4),
Amount=c(5,2,7,1,5,11,0,2))
My interpretation is that you want to create a dataframe with all row combinations, where Number and Type are the same and Year is different.
First suggestion - join on Number and Type, then remove rows that have different Year. I added an index to prevent redundant matches (1 with 2 and 2 with 1).
df$index <- 1:nrow(df)
out <- merge(df,df,by=c("Number","Type"))
out <- out[which(out$index.x>out$index.y & out$Year.x!=out$Year.y),]
Second suggestion - if you want to see a version using a loop.
out2 <- NULL
for (i in c(1:(nrow(df)-1))){
for (j in c((i+1):nrow(df))){
if(df[i,"Year"]!=df[j,"Year"] & df[i,"Number"]==df[j,"Number"] & df[i,"Type"]==df[j,"Type"]){
out2 <- rbind(out2,cbind(df[i,],df[j,]))
}
}
}

Assigning values to correlative series in r

I hope you can help me with this issue I have.
I have a big dataframe, to simplify it, it look like this:
df <- data.frame(radius = c (2,3,5,7,4,6,9,8,3,7,8,9,2,4,5,2,6,7,8,9,1,10,8))
df$num <- c(1,2,3,4,5,6,7,8,9,10,11,1,12,13,1,14,15,16,17,18,19,1,1)
df
The column $num has correlative series (1-11, 1, 12-13, 1, 14-19,1,1)
I would like to assign a value (sorted) per each correlative serie as a column. the outcome should be like this:
df$outcome <- c(1,1,1,1,1,1,1,1,1,1,1,2,3,3,4,5,5,5,5,5,5,6,7)
df
thanks a lot!
A.
We can get the difference between adjacent elements in 'num' using diff and check whether it is not equal to 1. The logical output will be one less than the length of the 'num' vector. We pad with 'TRUE' and cumsum to get the expected output.
df$outcome <- cumsum(c(TRUE,diff(df$num)!=1))
df$outcome
#[1] 1 1 1 1 1 1 1 1 1 1 1 2 3 3 4 5 5 5 5 5 5 6 7

Resources