Paste date in new column if condition is true in another R [duplicate] - r

This question already has an answer here:
Replace value using index [R]
(1 answer)
Closed 2 years ago.
I want to extract the date from a variable if the condition in another variable is true.
Example: if comorbidity1==10, extract the date from smr_01, otherwise NA
I also need to do this for if if comorbidity1==11 OR comorbidity1==12, extract the date from smr_01, otherwise NA
This is what I want my data to look like
comorbidity1 smr_01 NewDate
1 20120607 NA
10 20120607 20120607
10 20120613 20120613
3 20121103 NA
6 20150607 NA
12 20140509 NA
11 20120405 NA
I have tried this
fulldata$NewDate<-ifelse(fulldata$comorbidity1==10, fulldata$smr_01, NA)
but it is not pasting the date in the correct format.
what I am getting looks like this
comorbidity1 smr_01 NewDate
1 20120607 NA
10 20120607 4675
10 20120613 17856
3 20121103 NA
6 20150607 NA
12 20140509 NA
11 20120405 NA
smr_01 is classed as a date
Thank you

Try :
df$NewDate <- as.Date(NA)
inds <- df$comorbidity1 == 10
#For more than 1 value use %in%
#inds <- df$comorbidity1 %in% 10:12
df$NewDate[inds] <- df$smr_01[inds]
df

Related

How to turn characters into numbers from a column in a dataset? [duplicate]

This question already has answers here:
Test if a vector contains a given element
(8 answers)
Replace logical values (TRUE / FALSE) with numeric (1 / 0)
(7 answers)
Closed 1 year ago.
I have a dataframe in the following form:
Date equity company press Categorization Year Month Event greenwashing
<chr> <dbl> <chr> <chr> <chr> <dbl> <chr> <dbl> <chr>
1 07/30/21 153. JPMorgan NA NA NA NA NA 0
2 07/29/21 153 JPMorgan NA NA NA NA NA 0
3 07/28/21 152. JPMorgan NA NA NA NA NA 0
4 07/27/21 151. JPMorgan NA NA NA NA NA 0
5 07/26/21 152. JPMorgan NA NA NA NA NA 0
6 07/23/21 151. JPMorgan NA NA NA NA NA 0
In the column 'greenwashing' there are some variables that are in the format character such as: The Guardian, Financial Times, among others. I need to turn these characters into 1.
I already tried to name a word list and use the if else code:
word.list = c("Financial Times",
"The Guardian",
"Mena Report",
"States News Service",
"US Newshire",
"DeSmogBlog",
"PR Newshire",
"The New York Times")
if(word.list){
print("1")
We can do this easily by converting the logical to integer with +
df$greenwashing <- +(df$greenwashing %in% word.list)
Using tidyverse:
Let's say your data frame is df, working off your attempt
library(tidyverse)
df <- mutate(df, greenwashing = ifelse(greenwashing %in% word.list, 1, greenwashing)
You can try -
df$greenwashing <- as.integer(df$greenwashing %in% word.list)
This will change all the word.list values to 1 and rest to 0

How do I indicate/select a certain column in tibble? [duplicate]

This question already has answers here:
Subset / filter rows in a data frame based on a condition in a column
(3 answers)
Closed 2 years ago.
position price model url
<int> <chr> <chr> <chr>
1 1 "\nab 1.699,00 €\~ "\nGROUND CONTROL\n" NA
2 2 "\nab 1.999,00 €\~ "\nROOT MILLER\n" NA
3 3 "\nab 3.099,00 €\~ "\nPIKES PEAK\n" NA
4 4 "\n" "\nTHE BRUCE\n" NA
5 5 "\n" "\nCOUNT SOLO\n" NA
6 6 "\nab 1.849,00 €\~ "\nPSYCHO PATH\n" NA
7 7 "\nab 2.599,00 €\~ "\nTHRILL HILL\n" NA
8 8 "\nab 2.899,00 €\~ "\nTHRILL HILL TRAI~ NA
9 9 "\nab 2.149,00 €\~ "\nSOUL FIRE\n" NA
this is a 33x4 tibble I created. I would like to get rid of the whole row without price info e.g. 4th 5th rows(they do not have the data bc the product is not for sale). I thought of the filter or subset function with condition nchar(...)!=0 but I am having trouble indicating that column. Can you help me?
We can use a comparison operator to check whether the 'price' column is not equal to string "\n" to subset the rows
df2 <- subset(df1, price != "\n")
nchar will not be 0 when there is \n
nchar("\n")
#[1] 1

Impute only certain NA's for a variable in a data frame [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 4 years ago.
Improve this question
I'm new to R and exploring different beautiful options in it. I'm working on a data frame where I have a variable with 900 missing values, i.e NAs.
I want to impute 3 different values for NAs;
1st 300 NA's with Value 1.
2nd 300 NA's with Value 2.
3rd 300 NA's with Value 3.
There are a total of 23272 rows in the data.
dim(data)
[1] 23272 2
colSums(is.na(data))
month year
884 884
summary(data$month)
1 2 3 4 5 6 7 8 9 10 11 12 NA's
1977 1658 1837 1584 1703 1920 1789 2046 1955 2026 1845 2048 884
If we check the month 8,10 and 12. There is no much differences, Hence thought of assigning these 3 months to NA by splitting at the ratio (300:300:284). Usually we go my MODE, but I want to try this approach.
I assume you mean you a have a long list, some of the values of which are NAs:
set.seed(42)
df <- data.frame(val = sample(c(1:3, NA_real_), size = 1000, replace = TRUE))
We can keep a running tally of NA's and assign those to the imputed value using integer division with %/%.
library(tidyverse)
df2 <- df %>%
mutate(NA_num = if_else(is.na(val),
cumsum(is.na(val)),
NA_integer_),
imputed = NA_num %/% 100 + 1)
Output:
df2 %>%
slice(397:410) # based on manual examination using this seed
val NA_num imputed
1 NA 98 1
2 NA 99 1
3 3 NA NA
4 1 NA NA
5 1 NA NA
6 3 NA NA
7 3 NA NA
8 2 NA NA
9 NA 100 2
10 1 NA NA
11 NA 101 2
12 2 NA NA
13 1 NA NA
14 2 NA NA
Without an example, I think this will work.
Basically, filter the NAs to a new table, do the calc and merge it back. Assume the new_dt is the OG data where you filter to only contain the NAs
library('tidyverse');
new_dt = data.frame(x1 =rep(1:900), x2= NA) %>% filter(is.na(x2)) %>%
mutate(23 = case_when(row_number()%/%300==0 ~1,
row_number()%/%300==1 ~2,
row_number()%/%300==2 ~3))
dt <- rbind(dt,new_dt)

Hold current value until non-null value occurs [duplicate]

This question already has answers here:
Replacing NAs with latest non-NA value
(21 answers)
Closed 5 years ago.
Hi I come from a background in SAS and I am relatively new to R. I am attempting to convert an existing SAS program into equivalent R code
I am unsure how to achieve the equivalent of SAS's "retain" and "by" Behavior in R
I have a dataframe with two columns first column is a date column and the second column is a numeric value.
The numeric column represents a result from lab test. The test is conducted semi-regularly so on some days there will be Null values in the data. The data is ordered by date and the dates are sequential.
i.e example data looks like this
Date Result
2017/01/01 15
2017/01/02 NA
2017/01/03 NA
2017/01/04 12
2017/01/05 NA
2017/01/06 13
2017/01/07 11
2017/01/08 NA
I would like to create a third column which would contain the most recent result.
If Result column is Null it should be set to most recent previously non Null Result otherwise it should contain the Result value
My desired output would look like this:
Date Result My_var
2017/01/01 15 15
2017/01/02 NA 15
2017/01/03 NA 15
2017/01/04 12 12
2017/01/05 NA 12
2017/01/06 13 13
2017/01/07 11 11
2017/01/08 NA 11
In SAS I can achieve this with something like following code snippet:
data my_data;
retain My_var;
set input_data;
by date;
if Result not = . then
my_var = result;
run;
I am stumped as to how to do this in R I do not think R supports By group processing as in SAS - or at least I don't know how to set that as option.
I have naively tried:
my_data <- mutate(input_data, my_var = if(is.na(Result)) {lag(Result)} else {Result})
But I do not think that syntax is correct.
We can use na.locf function from the zoo package to fill in the missing values.
library(zoo)
dt$My_var <- na.locf(dt$Result)
dt
# Date Result My_var
# 1 2017/01/01 15 15
# 2 2017/01/02 NA 15
# 3 2017/01/03 NA 15
# 4 2017/01/04 12 12
# 5 2017/01/05 NA 12
# 6 2017/01/06 13 13
# 7 2017/01/07 11 11
# 8 2017/01/08 NA 11
Or the fill function from the tidyr package.
library(dplyr)
library(tidyr)
dt <- dt %>%
mutate(My_var = Result) %>%
fill(My_var)
dt
# Date Result My_var
# 1 2017/01/01 15 15
# 2 2017/01/02 NA 15
# 3 2017/01/03 NA 15
# 4 2017/01/04 12 12
# 5 2017/01/05 NA 12
# 6 2017/01/06 13 13
# 7 2017/01/07 11 11
# 8 2017/01/08 NA 11
DATA
dt <- read.table(text = "Date Result
2017/01/01 15
2017/01/02 NA
2017/01/03 NA
2017/01/04 12
2017/01/05 NA
2017/01/06 13
2017/01/07 11
2017/01/08 NA",
header = TRUE, stringsAsFactors = FALSE)

cross sectional sub-sets in data.table

I have a data.table which contains multiple columns, which is well represented by the following:
DT <- data.table(date = as.IDate(rep(c("2012-10-17", "2012-10-18", "2012-10-19"), each=10)),
session = c(1,2,3), price = c(10, 11, 12,13,14),
volume = runif(30, min=10, max=1000))
I would like to extract a multiple column table which shows the volume traded at each price in a particular type of session -- with each column representing a date.
At present, i extract this data one date at a time using the following:
DT[session==1,][date=="2012-10-17", sum(volume), by=price]
and then bind the columns.
Is there a way of obtaining the end product (a table with each column referring to a particular date) without sticking all the single queries together -- as i'm currently doing?
thanks
Does the following do what you want.
A combination of reshape2 and data.table
library(reshape2)
.DT <- DT[,sum(volume),by = list(price,date,session)][, DATE := as.character(date)]
# reshape2 for casting to wide -- it doesn't seem to like IDate columns, hence
# the character DATE co
dcast(.DT, session + price ~ DATE, value.var = 'V1')
session price 2012-10-17 2012-10-18 2012-10-19
1 1 10 308.9528 592.7259 NA
2 1 11 649.7541 NA 816.3317
3 1 12 NA 502.2700 766.3128
4 1 13 424.8113 163.7651 NA
5 1 14 682.5043 NA 147.1439
6 2 10 NA 755.2650 998.7646
7 2 11 251.3691 695.0153 NA
8 2 12 791.6882 NA 275.4777
9 2 13 NA 111.7700 240.3329
10 2 14 230.6461 817.9438 NA
11 3 10 902.9220 NA 870.3641
12 3 11 NA 719.8441 963.1768
13 3 12 361.8612 563.9518 NA
14 3 13 393.6963 NA 718.7878
15 3 14 NA 871.4986 582.6158
If you just wanted session 1
dcast(.DT[session == 1L], session + price ~ DATE)
session price 2012-10-17 2012-10-18 2012-10-19
1 1 10 308.9528 592.7259 NA
2 1 11 649.7541 NA 816.3317
3 1 12 NA 502.2700 766.3128
4 1 13 424.8113 163.7651 NA
5 1 14 682.5043 NA 147.1439

Resources