Subtract the value of two columns by 1 from a variable - r

How to subtract from a data.frame the value of two columns by 1
So far I couldn't find anything about how to address a column from a data.frame and subtract all values at one
myData:
src target
1 1
2 2
3 3
4 4
Should become:
src target
0 0
1 1
2 2
3 3

If we need to subtract the last two columns, get the last columns extracted with tail
nm1 <- tail(names(df1), 2)
df2 <- df1[nm1] -1

Related

if i want to sort a column by size in rstudio, how do i make sure that the associated values of the rows sort with the column?

I have a data.frame with 1200 rows and 5 columns, where each row contains 5 values of one person. now i need to sort one column by size but I want the remaining columns to sort with the column, so that one column is sorted by increasing values and the other columns contain the values of the right persons. ( So that one row still contains data from one and the same person)
colnames(BAPlotDET) = c("fsskiddet", "fspiddet","avg", "diff","absdiff")
these are the column names of my data.frame and I wanna sort it by the column called "avg"
First of all, please always provide us with a reproducible example such as below. The sorting of a data frame by default sorts all columns.
vector <- 1:3
BAPlotDET <- data.frame(vector, vector, vector, vector, vector)
colnames(BAPlotDET) = c("fsskiddet", "fspiddet","avg", "diff","absdiff")
fsskiddet fspiddet avg diff absdiff
1 1 1 1 1 1
2 2 2 2 2 2
3 3 3 3 3 3
BAPlotDET <- BAPlotDET[order(-BAPlotDET$avg),]
> BAPlotDET
fsskiddet fspiddet avg diff absdiff
3 3 3 3 3 3
2 2 2 2 2 2
1 1 1 1 1 1

For loop to paste rows to create new dataframe from existing dataframe

New to SO, but can't figure out how to get this code to work. I have a dataframe that is very large, and is set up like this:
Number Year Type Amount
1 1 A 5
1 2 A 2
1 3 A 7
1 4 A 1
1 1 B 5
1 2 B 11
1 3 B 0
1 4 B 2
This goes onto multiple for multiple numbers. I want to take this dataframe and make a new dataframe that has two of the rows together, but it would be nested (for example, row 1 and row 2, row 1 and row 3, row 1 and row 4, row 2 and row 3, row 2 and row 4) where each combination of each year is together within types and numbers.
Example output:
Number Year Type Amount Number Year Type Amount
1 1 A 5 1 2 A 2
1 1 A 5 1 3 A 7
1 1 A 5 1 4 A 1
1 2 A 2 1 3 A 7
1 2 A 2 1 4 A 1
1 3 A 7 1 4 A 1
I thought that I would do a for loop to loop within number and type, but I do not know how to make the rows paste from there, or how to ensure that I am only getting the combinations of the rows once. For example:
for(i in 1:n_number){
for(j in 1:n_type){
....}}
Any tips would be appreciated! I am relatively new to coding, so I don't know if I should be using a for loop at all. Thank you!
df <- data.frame(Number= rep(1,8),
Year = rep(c(1:4),2),
Type = rep(c('A','B'),each=4),
Amount=c(5,2,7,1,5,11,0,2))
My interpretation is that you want to create a dataframe with all row combinations, where Number and Type are the same and Year is different.
First suggestion - join on Number and Type, then remove rows that have different Year. I added an index to prevent redundant matches (1 with 2 and 2 with 1).
df$index <- 1:nrow(df)
out <- merge(df,df,by=c("Number","Type"))
out <- out[which(out$index.x>out$index.y & out$Year.x!=out$Year.y),]
Second suggestion - if you want to see a version using a loop.
out2 <- NULL
for (i in c(1:(nrow(df)-1))){
for (j in c((i+1):nrow(df))){
if(df[i,"Year"]!=df[j,"Year"] & df[i,"Number"]==df[j,"Number"] & df[i,"Type"]==df[j,"Type"]){
out2 <- rbind(out2,cbind(df[i,],df[j,]))
}
}
}

How to tidy up a character column?

What I have:
test_df <- data.frame(isolate=c(1,2,3,4,1,2,3,4,5),label=c(1,1,1,1,2,2,2,2,2),alignment=c("--at","at--","--at","--at","a--","acg","a--","a--", "agg"))
> test_df
isolate label alignment
1 1 1 --at
2 2 1 at--
3 3 1 --at
4 4 1 --at
5 1 2 a--
6 2 2 acg
7 3 2 a--
8 4 2 a--
9 5 2 agg
What I want:
I'd like to explode the alignment field into two columns, position and character:
> test_df
isolate label aln_pos aln_char
1 1 1 1 -
2 1 1 2 -
3 1 1 3 a
4 1 1 4 t
...
Not all alignments are the same length, but all alignments with the same label have the same length.
What I've tried:
I was thinking I could use separate to first make each position have its own column, then use gather turn those columns into key value pairs. However, I haven't been able to get the separate part right.
Since you mentioned tidyr::gather, you could try this:
test_df <- data.frame(isolate=c(1,2,3,4,1,2,3,4,5),
label=c(1,1,1,1,2,2,2,2,2),
alignment=c("--at","at--","--at","--at","a--","acg","a--","a--", "agg"),
stringsAsFactors = FALSE)
library(tidyverse)
test_df %>%
mutate(alignment = strsplit(alignment,"")) %>%
unnest(alignment)
In base R, you can use indexing along with creation of a list with strsplit like this.
# make variable a character vector
test_df$alignment <- as.character(test_df$alignment)
# get list of individual characters
myList <- strsplit(test_df$alignment, split="")
then build the data.frame
# construct data.frame
final_df <- cbind(test_df[rep(seq_len(nrow(test_df)), lengths(myList)),
c("isolate", "label")],
aln_pos=sequence(lengths(myList)),
aln_char=unlist(myList))
Here, we take the first two columns of the original data.frame and repeat the rows using rep with a vector input in its second argument telling it how many times to repeat the corresponding value in its first argument. The number of times is calculated with lengths. The second argument of cbind is a call to sequence taking the same lengths output. this produces counts from 1 to the corresponding length. The third argument is the unlisted character values.
this returns
head(final_df, 10)
isolate label aln_pos aln_char
1 1 1 1 -
1.1 1 1 2 -
1.2 1 1 3 a
1.3 1 1 4 t
2 2 1 1 a
2.1 2 1 2 t
2.2 2 1 3 -
2.3 2 1 4 -
3 3 1 1 -
3.1 3 1 2 -

How to find the rows with maximum number of variables columnwise r

I have a data frame that has thousands of rows and 25 columns. The rows have different lengths, meaning that not all have values for all columns. However, the empty cells are always at the end of the row:
1 1 1 1
1 1 1
1 1 1 1 1 1 1 1
1 1 1 1 1 1
How can I find the longest row? In the example above it would be row number 3.
If the missing values are NAs and all those values are at the end.
indx <- which.max(rowSums(!is.na(df1)))
Or we can use apply to loop over the rows
indx <- which.max(apply(df1, 1, function(x) length(x[!is.na(x)])))
If the missing values are ''
indx <- which.max(apply(df1, 1, function(x) length(x[x!=''])))

How to match 1 column to 2 columns?

I'm trying to match numbers from one column to numbers in two other columns. I can do this just fine when matching to only a single column, but have problems extending to two columns. Here is what I am doing:
I have 2 dataframes, df1:
number value
1
2
3
4
5
and df2:
number_a number_b value
3 3
1 5
5 1
4 2
2 4
What I want to do is match column "number" from df1 to EITHER "number_a" or number_b" in df2, then insert "value" from df2 into "value" of df1, to give the result df1 as:
number value
1 5
2 4
3 3
4 2
5 1
My approach is to use
df1$value <- df2$value[match(df1$number, df2$number_a)]
or
df1$value <- df2$value[match(df1$number, df2$number_b)]
which yields, respectively, for df1
number value
1 NA
2 NA
3 3
4 NA
5 1
and
number value
1 5
2 4
3 NA
4 2
5 NA
However, I can't seem to fill in all of the "value" column in df1 using this approach. How can I match "number" to "number_a" and "number_b" in one fell swoop. I tried
df1$value <- df2$value[match(df1$number, df2$number_a:number_b)]
but that didn't work.
Thanks!
Easier solution:
df2$number <- ifelse(is.na(df2$number_a), df2$number_b, df2$number_a)
If you're not familiar with ifelse, it works with vectors in the form:
ifelse(Condition, ValueIfTrue, ValueIfFalse)
I am a newbie to R (coming from several years with C). Was trying out the suggestions and I thought I would paste what I came up with:
// Assuming either 'number_a' or 'number_b' is valid
// Combine into new column 'number' and delete them original columns
df2 <- transform(df2, number = ifelse(is.na(df2$number_a), df2$number_b,
df2$number_a))[-c(1:2)]
// Combine the two data frames by the column 'number'
df <- merge(df1, df2, by = "number")
number value
1 5
2 4
3 3
4 2
5 1

Resources