Conditional subtraction in R data frame - r

I have a fairly straightforward need, but I can't find a previously asked question that is similar enough. I've been trying with dplyr, but can't figure it out.
julian year
088 22
049 19
041 22
105 18
125 22
245 20
What I want is for each value where data$julian < 105, subtract '1' from data$year, so that
julian year
088 21
049 18
041 21
105 18
125 22
245 20

OP asked about using dplyr in the post. Here, is one with dplyr
library(dplyr)
df1 <- df1 %>%
mutate(year = case_when(as.numeric(julian) < 105 ~ year -1,
TRUE ~ as.numeric(year)))
-output
df1
julian year
1 088 21
2 049 18
3 041 21
4 105 18
5 125 22
6 245 20
data
df1 <- structure(list(julian = c("088", "049", "041", "105", "125",
"245"), year = c(22L, 19L, 22L, 18L, 22L, 20L)), row.names = c(NA,
-6L), class = "data.frame")

Another option with base R:
df$year[df$julian < 105] <- df$year[df$julian < 105] - 1
Output
julian year
1 088 21
2 049 18
3 041 21
4 105 18
5 125 22
6 245 20
Data
df <- structure(list(name = c("KKSWAP", "KKSWAP"), code = c("The liquidations code for Marco are: 51-BMR05, 74-VAD08, 176-VNF09.",
"The liquidations code for Clara are: 88-BMR05, 90-VAD08, 152-VNF09."
)), class = "data.frame", row.names = c(NA, -2L))

Related

Calculating Percent Change in R for Multiple Variables

I'm trying to calculate percent change in R with each of the time points included in the column label (table below). I have dplyr loaded and my dataset was loaded in R and I named it data. Below is the code I'm using but it's not calculating correctly. I want to create a new dataframe called data_per_chg which contains the percent change from "v1" each variable from. For instance, for wbc variable, I would like to calculate percent change of wbc.v1 from wbc.v1, wbc.v2 from wbc.v1, wbc.v3 from wbc.v1, etc, and do that for all the remaining variables in my dataset. I'm assuming I can probably use a loop to easily do this but I'm pretty new to R so I'm not quite sure how proceed. Any guidance will be greatly appreciated.
id
wbc.v1
wbc.v2
wbc.v3
rbc.v1
rbc.v2
rbc.v3
hct.v1
hct.v2
hct.v3
a1
23
63
30
23
56
90
13
89
47
a2
81
45
46
N/A
18
78
14
45
22
a3
NA
27
14
29
67
46
37
34
33
data_per_chg<-data%>%
group_by(id%>%
arrange(id)%>%
mutate(change=(wbc.v2-wbc.v1)/(wbc.v1))
data_per_chg
Assuming the NA values are all NA and no N/A
library(dplyr)
library(stringr)
data <- data %>%
na_if("N/A") %>%
type.convert(as.is = TRUE) %>%
mutate(across(-c(id, matches("\\.v1$")), ~ {
v1 <- get(str_replace(cur_column(), "v\\d+$", "v1"))
(.x - v1)/v1}, .names = "{.col}_change"))
-output
data
id wbc.v1 wbc.v2 wbc.v3 rbc.v1 rbc.v2 rbc.v3 hct.v1 hct.v2 hct.v3 wbc.v2_change wbc.v3_change rbc.v2_change rbc.v3_change hct.v2_change hct.v3_change
1 a1 23 63 30 23 56 90 13 89 47 1.7391304 0.3043478 1.434783 2.9130435 5.84615385 2.6153846
2 a2 81 45 46 NA 18 78 14 45 22 -0.4444444 -0.4320988 NA NA 2.21428571 0.5714286
3 a3 NA 27 14 29 67 46 37 34 33 NA NA 1.310345 0.5862069 -0.08108108 -0.1081081
If we want to keep the 'v1' columns as well
data %>%
na_if("N/A") %>%
type.convert(as.is = TRUE) %>%
mutate(across(ends_with('.v1'), ~ .x - .x,
.names = "{str_replace(.col, 'v1', 'v1change')}")) %>%
transmute(id, across(ends_with('change')),
across(-c(id, matches("\\.v1$"), ends_with('change')),
~ {
v1 <- get(str_replace(cur_column(), "v\\d+$", "v1"))
(.x - v1)/v1}, .names = "{.col}_change")) %>%
select(id, starts_with('wbc'), starts_with('rbc'), starts_with('hct'))
-output
id wbc.v1change wbc.v2_change wbc.v3_change rbc.v1change rbc.v2_change rbc.v3_change hct.v1change hct.v2_change hct.v3_change
1 a1 0 1.7391304 0.3043478 0 1.434783 2.9130435 0 5.84615385 2.6153846
2 a2 0 -0.4444444 -0.4320988 NA NA NA 0 2.21428571 0.5714286
3 a3 NA NA NA 0 1.310345 0.5862069 0 -0.08108108 -0.1081081
data
data <- structure(list(id = c("a1", "a2", "a3"), wbc.v1 = c(23L, 81L,
NA), wbc.v2 = c(63L, 45L, 27L), wbc.v3 = c(30L, 46L, 14L), rbc.v1 = c("23",
"N/A", "29"), rbc.v2 = c(56L, 18L, 67L), rbc.v3 = c(90L, 78L,
46L), hct.v1 = c(13L, 14L, 37L), hct.v2 = c(89L, 45L, 34L), hct.v3 = c(47L,
22L, 33L)), class = "data.frame", row.names = c(NA, -3L))

How to replace data in a column in R?

So I have a dataframe called "myData"
print(myData)
ID Name Status AGE
123 Mike Yes 18
124 John No 20
125 Lily Yes 21
126 Jasper No 24
127 Toby Yes 27
128 Will No 19
129 Oscar Yes 32
I received an updated dataframe that has updated "Status" called "myData2".
This dataframe has less observations than my original one and only has ID and Status.
This is the updated dataframe
print(myData2)
ID Status
123 Yes
125 Yes
126 Yes
128 No
129 No
Is there function where I can update 'Status' column in myData with the data in myData2 using the column "ID"?
This is my desired ouput
ID Name Status AGE
123 Mike Yes 18
124 John No 20
125 Lily Yes 21
126 Jasper Yes 24
127 Toby Yes 27
128 Will No 19
129 Oscar No 32
We can use data.table join to quickly update the first dataset 'Status' with the values of second after joining on 'ID'
library(data.table)
setDT(myData)[myData2, Status := i.Status, on = .(ID)]
myData
# ID Name Status AGE
#1: 123 Mike Yes 18
#2: 124 John No 20
#3: 125 Lily Yes 21
#4: 126 Jasper Yes 24
#5: 127 Toby Yes 27
#6: 128 Will No 19
#7: 129 Oscar No 32
In dplyr, we do a left_join and then coalesce the 'Status' columns
library(dplyr)
myData %>%
left_join(myData2, by = 'ID') %>%
mutate(Status = coalesce(Status.y, Status.x)) %>%
select(-Status.x, -Status.y)
data
myData <- structure(list(ID = 123:129, Name = c("Mike", "John", "Lily",
"Jasper", "Toby", "Will", "Oscar"), Status = c("Yes", "No", "Yes",
"No", "Yes", "No", "Yes"), AGE = c(18L, 20L, 21L, 24L, 27L, 19L,
32L)), class = "data.frame", row.names = c(NA, -7L))
myData2 <- structure(list(ID = c(123L, 125L, 126L, 128L, 129L), Status = c("Yes",
"Yes", "Yes", "No", "No")), class = "data.frame", row.names = c(NA,
-5L))
Here is a base R solution using merge, i.e.,
myData$Status <- with(merge(myData,myData2,by = "ID",all.x = TRUE),
ifelse(is.na(Status.y),Status.x,Status.y))
such that
> myData
ID Name Status AGE
1 123 Mike Yes 18
2 124 John No 20
3 125 Lily Yes 21
4 126 Jasper Yes 24
5 127 Toby Yes 27
6 128 Will No 19
7 129 Oscar No 32

How to remove a list of observations from a dataframe with dplyr in R? [duplicate]

This question already has answers here:
How to specify "does not contain" in dplyr filter
(4 answers)
dplyr Exclude row [duplicate]
(1 answer)
Closed 3 years ago.
This is my dataframe x
ID Name Initials AGE
123 Mike NA 18
124 John NA 20
125 Lily NA 21
126 Jasper NA 24
127 Toby NA 27
128 Will NA 19
129 Oscar NA 32
I also have a list of ID's I want to remove from data frame x, num[1:3], which is the following: y
>print(y)
[1] 124 125 129
My goal is remove all the ID's in y from data frame x
This is my desired output
ID Name Initials AGE
123 Mike NA 18
126 Jasper NA 24
127 Toby NA 27
128 Will NA 19
I'm using the dplyr package and trying this but its not working,
FinalData <- x %>%
select(everything()) %>%
filter(ID != c(y))
Can anyone tell me what needs to be corrected?
We can use %in% and negate ! when the length of the 'y' is greater than 1. The select step is not needed as it is selecting all the columns with everything()
library(dplyr)
x %>%
filter(!ID %in% y)
# ID Name Initials AGE
#1 123 Mike NA 18
#2 126 Jasper NA 24
#3 127 Toby NA 27
#4 128 Will NA 19
Or another option is anti_join
x %>%
anti_join(tibble(ID = y))
In base R, subset can be used
subset(x, !ID %in% y)
data
y <- c(124, 125, 129)
x <- structure(list(ID = 123:129, Name = c("Mike", "John", "Lily",
"Jasper", "Toby", "Will", "Oscar"), Initials = c(NA, NA, NA,
NA, NA, NA, NA), AGE = c(18L, 20L, 21L, 24L, 27L, 19L, 32L)),
class = "data.frame", row.names = c(NA,
-7L))

How to create a new table from original data and lookup table in R or matlab?

I have original temperature data in table1.txt with station number header which reads as
Date 101 102 103
1/1/2001 25 24 23
1/2/2001 23 20 15
1/3/2001 22 21 17
1/4/2001 21 27 18
1/5/2001 22 30 19
I have a lookup table file lookup.txt which reads as :
ID Station
1 101
2 103
3 102
4 101
5 102
Now, I want to create a new table (new.txt) with ID number header which should read as
Date 1 2 3 4 5
1/1/2001 25 23 24 25 24
1/2/2001 23 15 20 23 20
1/3/2001 22 17 21 22 21
1/4/2001 21 18 27 21 27
1/5/2001 22 19 30 22 30
Is there anyway I can do this in R or matlab??
I came up with a solution using tidyverse. It involves some wide to long transformation, matching the data frames on Station, and then spreading the variables.
#Recreating the data
library(tidyverse)
df1 <- read_table("text1.txt")
lookup <- read_table("lookup.txt")
#Create the output
k1 <- df1 %>%
gather(Station, value, -Date) %>%
mutate(Station = as.numeric(Station)) %>%
inner_join(lookup) %>% select(-Station) %>%
spread(ID, value)
k1
We can use base R to do this. Create a column index by matching the 'Station' column with the names of the first dataset, use that to duplicate the columns of 'df1' and then change the column names with the 'ID' column of second dataset
i1 <- with(df2, match(Station, names(df1)[-1]))
dfN <- df1[c(1, i1 + 1)]
names(dfN)[-1] <- df2$ID
dfN
# Date 1 2 3 4 5
#1 1/1/2001 25 23 24 25 24
#2 1/2/2001 23 15 20 23 20
#3 1/3/2001 22 17 21 22 21
#4 1/4/2001 21 18 27 21 27
#5 1/5/2001 22 19 30 22 30
data
df1 <- structure(list(Date = c("1/1/2001", "1/2/2001", "1/3/2001", "1/4/2001",
"1/5/2001"), `101` = c(25L, 23L, 22L, 21L, 22L), `102` = c(24L,
20L, 21L, 27L, 30L), `103` = c(23L, 15L, 17L, 18L, 19L)),
class = "data.frame", row.names = c(NA,
-5L))
df2 <- structure(list(ID = 1:5, Station = c(101L, 103L, 102L, 101L,
102L)), class = "data.frame", row.names = c(NA, -5L))
Here is an option with MatLab:
T = readtable('table1.txt','FileType','text','ReadVariableNames',1);
L = readtable('lookup.txt','FileType','text','ReadVariableNames',1);
old_header = strcat('x',num2str(L.Station));
newT = array2table(zeros(height(T),height(L)+1),...
'VariableNames',[{'Date'} strcat('x',num2cell(num2str(L.ID)).')]);
newT.Date = T.Date;
for k = 1:size(old_header,1)
newT{:,k+1} = T.(old_header(k,:));
end
writetable(newT,'new.txt','Delimiter',' ')

error -x should be numeric in data frame

I have the dataset like:
name state num1 num2 num3
abc rt 10 40 8
def ka 20 50 15
ert pn 30 60 16
i want rowsums of each row.while using rowsums(data) , its throwing the error like x should be numeric. so the new column should be total of num1,num2 and num3
some of the suuggestd solutions. However, first, as always, creating some date,
dta <- structure(list(name = structure(1:3, .Label = c("abc", "def",
"ert"), class = "factor"), state = structure(c(3L, 1L, 2L), .Label = c("ka",
"pn", "rt"), class = "factor"), num1 = c(10L, 20L, 30L), num2 = c(40L,
50L, 60L), num3 = c(8L, 15L, 16L)), .Names = c("name", "state",
"num1", "num2", "num3"), class = "data.frame", row.names = c(NA,
-3L))
Second, almost always, show the data,
dta
#> name state num1 num2 num3
#> 1 abc rt 10 40 8
#> 2 def ka 20 50 15
#> 3 ert pn 30 60 16
maybe also use str() as it's relevant to understand the spciac problem here,
str(dta)
#> 'data.frame': 3 obs. of 5 variables:
#> $ name : Factor w/ 3 levels "abc","def","ert": 1 2 3
#> $ state: Factor w/ 3 levels "ka","pn","rt": 3 1 2
#> $ num1 : int 10 20 30
#> $ num2 : int 40 50 60
#> $ num3 : int 8 15 16
The problem originate in that the data is a mix of factors and integers, obliviously we cannot sum factors
Now to some solutions.
First, akrun's first solution,
rowSums(dta[grep("num\\d+", names(dta))])
#> [1] 58 85 106
Second, Renu's solution,
rowSums(dta[,sapply(dta, is.numeric)])
#> [1] 58 85 106
Third, a slightly reword version of akrun's second solution,
# install.packages(c("tidyverse"), dependencies = TRUE)
library(tidyverse)
dta %>% select(matches("num\\d+")) %>% mutate(rowsum = rowSums(.))
#> num1 num2 num3 rowsum
#> 1 10 40 8 58
#> 2 20 50 15 85
#> 3 30 60 16 106
Finally, this nice plyr option,
# install.packages(c("plyr"), dependencies = TRUE)
plyr::numcolwise(sum)(dta)
#> num1 num2 num3
#> 1 60 150 39
Finally, here a almost identical question. Now they are at lest linked.

Resources