I have a dataframe that I have to sort in decreasing order of absolute row value without changing the actual values (some of which are negative).
To give you an example, e.g. for the 1st row, I would like to go from
-0.01189179 0.03687456 -0.12202753 to
-0.12202753 0.03687456 -0.01189179.
For the 2nd row from
-0.04220260 0.04129326 -0.07178175 to
-0.07178175 -0.04220260 0.04129326 etc.
How can I do this in R?
Many thanks!
Try this
lst <- lapply(df , \(x) order(-abs(x)))
ans <- data.frame(Map(\(x,y) x[y] , df ,lst))
output
a b
1 -0.01189179 -0.07178175
2 0.03687456 -0.04220260
3 -0.12202753 0.04129326
data
df <- structure(list(a = c(-0.12202753, 0.03687456, -0.01189179), b = c(-0.0422026,
0.04129326, -0.07178175)), row.names = c(NA, -3L), class = "data.frame")
Here is a simple approach (using #Mohamed Desouky's Data)
df <- df[nrow(df):1,]
> df
a b
3 -0.01189179 -0.07178175
2 0.03687456 0.04129326
1 -0.12202753 -0.04220260
Related
I have a problem.
I have the following data frame.
1
2
NA
100
1.00499
NA
1.00813
NA
0.99203
NA
Two columns. In the second column, apart from the starting value, there are only NAs. I want to fill the first NA of the 2nd column by multiplying the 1st value from column 2 with the 2nd value from column 1 (100* 1.00499). The 3rd value of column 2 should be the product of the 2nd new created value in column 2 and the 3rd value in column 1 and so on. So that at the end the NAs are replaced by values.
These two sources have helped me understand how to refer to different rows. But in both cases a new column is created.I don't want that. I want to fill the already existing column 2.
Use a value from the previous row in an R data.table calculation
https://statisticsglobe.com/use-previous-row-of-data-table-in-r
Can anyone help me?
Thanks so much in advance.
Sample code
library(quantmod)
data.N225<-getSymbols("^N225",from="1965-01-01", to="2022-03-30", auto.assign=FALSE, src='yahoo')
data.N225[c(1:3, nrow(data.N225)),]
data.N225<- na.omit(data.N225)
N225 <- data.N225[,6]
N225$DiskreteRendite= Delt(N225$N225.Adjusted)
N225[c(1:3,nrow(N225)),]
options(digits=5)
N225.diskret <- N225[,3]
N225.diskret[c(1:3,nrow(N225.diskret)),]
N225$diskretplus1 <- N225$DiskreteRendite+1
N225[c(1:3,nrow(N225)),]
library(dplyr)
N225$normiert <-"Value"
N225$normiert[1,] <-100
N225[c(1:3,nrow(N225)),]
N225.new <- N225[,4:5]
N225.new[c(1:3,nrow(N225.new)),]
Here is the code to create the data frame in R studio.
a <- c(NA, 1.0050,1.0081, 1.0095, 1.0016,0.9947)
b <- c(100, NA, NA, NA, NA, NA)
c<- data.frame(ONE = a, TWO=b)
You could use cumprod for cummulative product
transform(
df,
TWO = cumprod(c(na.omit(TWO),na.omit(ONE)))
)
which yields
ONE TWO
1 NA 100.0000
2 1.0050 100.5000
3 1.0081 101.3140
4 1.0095 102.2765
5 1.0016 102.4402
6 0.9947 101.8972
data
> dput(df)
structure(list(ONE = c(NA, 1.005, 1.0081, 1.0095, 1.0016, 0.9947
), TWO = c(100, NA, NA, NA, NA, NA)), class = "data.frame", row.names = c(NA,
-6L))
What about (gasp) a for loop?
(I'll use dat instead of c for your dataframe to avoid confusion with function c()).
for (row in 2:nrow(dat)) {
if (!is.na(dat$TWO[row-1])) {
dat$TWO[row] <- dat$ONE[row] * dat$TWO[row-1]
}
}
This means:
For each row from the second to the end, if the TWO in the previous row is not a missing value, calculate the TWO in this row by multiplying ONE in the current row and TWO from the previous row.
Output:
#> ONE TWO
#> 1 NA 100.0000
#> 2 1.0050 100.5000
#> 3 1.0081 101.3140
#> 4 1.0095 102.2765
#> 5 1.0016 102.4402
#> 6 0.9947 101.8972
Created on 2022-04-28 by the reprex package (v2.0.1)
I'd love to read a dplyr solution!
I have a data frame of dimension 100 by 54, where the rows are stock values at the end of the week, and each column represents a stock. I want to replace each entry in my data frame with the return value of the stock, so divide the current value of the cell by the previous one, and replace the current value by the new value. Example: Say I have this data frame with these values,
table 1
I want to manipulate my data frame to be:
table 2
So that it can eventually look like this:
table 3
I have written this as my code, but it does not do that job. I was wondering if someone can help me.
Returns99 <- NULL
for(i in 2:100){
Returns99 <- rbind(Returns99, rep(NA, 54))
Returns <- rbind(Returns99, (df100[i, ]/df100[i-1,]))
}
Where df100 is the data frame with price entries.
You don't need a loop. With Base R,
rbind(NA, df100[-1,] / df100[-nrow(df100),])
gives,
AGG DBC DFE
1 NA NA NA
2 1.0000000 1.0000000 1.0000000
3 1.0021019 0.9739496 0.9990862
4 0.9993008 1.0008628 0.9911585
Data:
structure(list(AGG = c(99.91, 99.91, 100.12, 100.05), DBC = c(23.8,
23.8, 23.18, 23.2), DFE = c(65.66, 65.66, 65.6, 65.02)), class = "data.frame", row.names = c(NA,
-4L))
I have a test_list
test_list <- list("hg38:Chr12:8823762", "hg38:Chr10:50814012", "hg19:Chr12:8990070",
"hg38:chr1:16949", "hg38:chr9:342484")
and I want to check if each element in my list partially match my column Extra_information in df
df <- structure(list(Extra_information = c("hg38:Chr10:50814012, hg19:Chr10:52573772, CpG:Mutation may have occured by deamination of methylated CpG dinucleotide",
"hg38:Chr12:8822661, hg19:Chr12:8975257, COM:Patient is homozygous for c.706C>G p.Leu236Val in SLC26A4., dbSNP:http://www.ncbi.nlm.nih.gov/SNP/snp_ref.cgi?type=rs&rs=rs1409944554",
"hg38:Chr12:8823729, hg19:Chr12:8976325, COM:Variant of unknown significance. Clinical features descr. in supplementary table 2. functional study., dbSNP:http://www.ncbi.nlm.nih.gov/SNP/snp_ref.cgi?type=rs&rs=rs766201825",
"hg38:Chr12:8823762, hg19:Chr12:8976358, COM:VUS Table 2. RIT1 variant also present.",
"hg38:Chr12:8835642, hg19:Chr12:8988238, COM:VUS Table 2. SOS1 and CBL variants also present., dbSNP:http://www.ncbi.nlm.nih.gov/SNP/snp_ref.cgi?type=rs&rs=rs11047499",
"hg38:Chr12:8837474, hg19:Chr12:8990070, dbSNP:http://www.ncbi.nlm.nih.gov/SNP/snp_ref.cgi?type=rs&rs=rs863224952"
)), row.names = c(NA, 6L), class = "data.frame")
to obtain a dataframe of my list with the values 1 for TRUE and 0 for FALSE:
test_df <- structure(list(Entries = c("hg38:Chr12:8823762", "hg38:Chr10:50814012", "hg19:Chr12:8990070",
"hg38:chr1:16949", "hg38:chr9:342484"), Values = c(1,1,1,0,0)), row.names = c(NA, 5L), class = "data.frame"))
How can I achieve the desired output ?
Thanks in advance.
Here's a base R approach.
data.frame(Entries = unlist(test_list),
Values = sapply(test_list,function(x){
as.numeric(length(grep(x,df$Extra_information)) > 0)
}))
# Entries Values
#1 hg38:Chr12:8823762 1
#2 hg38:Chr10:50814012 1
#3 hg19:Chr12:8990070 1
#4 hg38:chr1:16949 0
#5 hg38:chr9:342484 0
We could use agrepl to check for partial matches between the 'test_list' elements and the 'Extra_information' (from base R only)
Values <- +(sapply(test_list, function(x) any(agrepl(x, df$Extra_information))))
data.frame(Entries = unlist(test_list), Values)
# Entries Values
#1 hg38:Chr12:8823762 1
#2 hg38:Chr10:50814012 1
#3 hg19:Chr12:8990070 1
#4 hg38:chr1:16949 0
#5 hg38:chr9:342484 0
Say I have a dataframe of tens of columns, and my custom function needs each one of these columns plus a number in a vector to give me the desired output. After being done with all that, I need to generate new column names based on the original column names in the dataframe. How to accomplish this using the tidyverse, instead of for loops or other solutions in base R.
MWE
structure(list(col1 = c(36.0520583373645, 37.9423749063706, 33.6806634587719,
34.031649012457, 29.5448679963449, NA, 34.7576769718877, 30.484217745574,
32.9849083643022, 27.4081694831058, 35.8624919654559, 35.0284347997991,
NA, 32.112605893241, 27.819354948082, 35.6499532124921, 35.0265642403216,
32.4006569441297, 30.3698557864842, 31.8229364456928, 34.3715903109276
), col2 = c(32.9691195198199, 35.6643664156284, 33.8748732989736,
34.5436311813644, 33.2228201914256, 38.7621696867191, 34.8399804318992,
32.9063078995457, 35.7391166214367, 32.7217251282669, 36.3039268989853,
35.9607654868559, 33.1385915196435, 34.7987649028199, 33.7100463668523,
34.7773403671057, 35.8592997980752, 33.8537127786535, 31.9106243803505,
39.3099469314882, 35.1849826815196), col3 = c(33.272278716963,
NA, 31.8594920410129, 33.1695042551974, 29.3800694974438, 35.1504378875245,
34.0771487001433, 29.0162879030415, 30.6960024888799, 29.5542117965184,
34.3726321365982, 36.0602274148362, 33.1207772548047, 31.5506876209822,
28.8649303491974, 33.4598790144265, 30.5573454464747, 31.6026723913051,
30.4716061556625, 33.009463000301, 30.846230953425)), row.names = c(NA,
-21L), class = "data.frame")
save above in a file, and then use example <- dget(file.choose()) to read the above dataframe.
Code
y <- c (2, 1, 1.5)
customfun <- function(x, y){
n <- log (x) * y
print (n)
}
df <- example %>%
dplyr::mutate(col1.log = customfun (col1, y = y[1])) %>%
dplyr::mutate(col2.log = customfun (col2, y = y[2])) %>%
dplyr::mutate(col3.log = customfun (col3, y = y[3]))
Question
Imagine I have tens of these columns not only 3 as in the MWE, how to generate the new ones dynamically using the tidyverse?
We can use map2 and bind_cols to add new columns
library(dplyr)
library(purrr)
bind_cols(example, map2_df(example, y, customfun) %>%
rename_all(~paste0(., ".log")))
# col1 col2 col3 col1.log col2.log col3.log
#1 36.05206 32.96912 33.27228 7.169928 3.495571 5.257087
#2 37.94237 35.66437 NA 7.272137 3.574152 NA
#3 33.68066 33.87487 31.85949 7.033848 3.522674 5.192003
#4 34.03165 34.54363 33.16950 7.054582 3.542223 5.252446
#...
tidyverse is not great for these sweep()-like operations, however, one option could be:
example %>%
do(., sweep(., 2, FUN = customfun, y)) %>%
rename_all(~ paste(., "log", sep = "."))
col1.log col2.log col3.log
1 7.169928 3.495571 5.257087
2 7.272137 3.574152 NA
3 7.033848 3.522674 5.192003
4 7.054582 3.542223 5.252446
5 6.771820 3.503237 5.070475
6 NA 3.657445 5.339456
7 7.096801 3.550766 5.292941
8 6.834418 3.493664 5.051786
9 6.992100 3.576246 5.136199
10 6.621682 3.488039 5.079339
I have two columns . both are of character data type.
One column has strings and other has got strings with quote.
I want to compare both columns and find the no. of distinct names across the data frame.
string f.string.name
john NA
bravo NA
NA "john"
NA "hulk"
Here the count should be 2, as john is common.
Somehow i am not able to remove quotes from second column. Not sure why.
Thanks
The main problem I'm seeing are the NA values.
First, let's get rid of the quotes you mention.
dat$f.string.name <- gsub('["]', '', dat$f.string.name)
Now, count the number of distinct values.
i1 <- complete.cases(dat$string)
i2 <- complete.cases(dat$f.string.name)
sum(dat$string[i1] %in% dat$f.string.name[i2]) + sum(dat$f.string.name[i2] %in% dat$string[i1])
DATA
dat <-
structure(list(string = c("john", "bravo", NA, NA), f.string.name = c(NA,
NA, "\"john\"", "\"hulk\"")), .Names = c("string", "f.string.name"
), class = "data.frame", row.names = c(NA, -4L))
library(stringr)
table(str_replace_all(unlist(df), '["]', ''))
# bravo hulk john
# 1 1 2