Creating new variable using piping in R - r

I'm trying to create a new variable confirmed_delta_perc in a list of commands (piping) but am having an issue with the variable active_delta showing it is not found. I have confirmed it is in the data frame but is not being read. It also doesn't add the new variable.
COVID %>%
select(county, confirmed, confirmed_delta) %>%
mutate(confirmed_delta_perc = active_delta/active * 100) %>%
filter(confirmed_delta_perc == 32)
Error:
Error in `mutate()`:
! Problem while computing `confirmed_delta_perc =
active_delta/active`.
Caused by error:
! object 'active_delta' not found
This is the full list of directions to including in the pipe:
Using piping, create a link of commands that selects the county, confirmed, and confirmed_delta variables. Create a new variable called confirmed_delta_perc using the mutate() function. The values in this column should be the percentage of active delta cases of all active cases. Filter for all observation(s) that have a confirmed_delta_perc value of 32. Print out all observation(s).
I've tried modifing the mutate() by renaming the dataframe so it "redoes" it and adds the new variable but it doesn't work either.
There's not any observations that actually equal 32 but it still should add the variable but is not.
Does anyone have any ideas?
dput(head(COVID))
structure(list(county = c("Washington", "Fountain", "Jay", "Wabash",
"Fayette", "Washington"), confirmed = c(620L, 737L, 930L, 1530L,
1336L, 675L), confirmed_delta = c(18L, 12L, 11L, 49L, 19L, 29L
), deaths = c(5L, 8L, 14L, 25L, 33L, 6L), deaths_delta = c(0L,
1L, 0L, 1L, 0L, 1L), recovered = c(0L, 0L, 0L, 0L, 0L, 0L), recovered_delta = c(0L,
0L, 0L, 0L, 0L, 0L), active = c(615L, 729L, 918L, 1512L, 1305L,
669L), active_delta = c(18L, 11L, 11L, 49L, 19L, 28L), active_delta_perc = c(0.0292682926829268,
0.0150891632373114, 0.0119825708061002, 0.0324074074074074, 0.0145593869731801,
0.0418535127055306)), row.names = c(NA, 6L), class = "data.frame")```

For most numbers of cases, it is impossible for any portion of them to be exactly 32%. For instance what we would report 29 of 90 cases as "32%" but that's really 32.222222 which is not strictly equal to 32. So you will need to specify what range around 32 counts as a match. Here, I say anything within 0.5 of 32 on either side, from 31.5 to 32.5, is close enough.
COVID <- COVID %>%
mutate(confirmed_delta_perc = active_delta/active * 100) %>%
filter(abs(confirmed_delta_perc - 32) <= 0.5)

try this:
COVID <- COVID %>%
mutate(confirmed_delta_perc = active_delta/active * 100) %>%
filter( round(confirmed_delta_perc, 0) == 32)
filtering by abs function as suggested by #JonSpring in the comments is better though

Related

How to skip and disregard a row in a loop that can't be read by a line of code or that provides error?

structure(list(`total primary - yes RS` = c(0L, 138L, 101L, 86L,
118L), `total primary - no RS` = c(0L, 29L, 39L, 35L, 38L), `total secondary- yes rs` = c(0L,
6L, 15L, 3L, 15L), `total secondary- no rs` = c(0L, 0L, 7L, 1L,
2L)), row.names = c(NA, -5L), class = c("tbl_df", "tbl", "data.frame"
))
I had previously asked for a line of code that could run a chisquare for each of four rows included
https://stackoverflow.com/questions/66750999/with-r-i-would-like-to-loop-through-each-row-and-create-corresponding-chisquare/66751018#66751018
Though the script worked it only worked because the four rows were able to run through the script.
library(broom)
library(dplyr)
apply(df, 1, function(x) tidy(chisq.test(matrix(x, ncol = 2)))) %>%
bind_rows
I now have a line that has zero and when i run the same script i get
Error in stats::chisq.test(x, y, ...) :
at least one entry of 'x' must be positive
I tried to do something using tryCatch(), this way
tryCatch(apply(df, 1, function(x) tidy(chisq.test(matrix(x, ncol = 2))))) %>%
bind_rows
but it did not work. Ultimately the dataset has a bunch of rows like this I would like a scenario where the script recognizes that it isn't only in row 1, but in multiple rows like 5,23,67 and so on.
I am not sure I am following your code/data exactly, but what if you move your tryCatch statement inside the apply statement like so: apply(df, 2, function(x) tryCatch(tidy(chisq.test(matrix(x, ncol = 2))))) %>% bind_rows? Does that help at all?

How can I rearrange the date from d-m-y to m-d-y in R?

I am having issues with the following R code. I am trying to rearrange csv date values in a column from day-month-year to month-day-year. To issues arise: the format is changed to year-month-day instead, and this error message appears when I attempt to plot the results:
Error: Column New_Date is a date/time and must be stored as POSIXct, not POSIXlt.
I am new to R and unsure on how to fix this error.
I have gone through a lot of similar topics, however because of lack of knowledge in R, I am unable to understand whether these topics can translate to my own code, and the information that I need.
Any help is much appreciated. The code is due relatively soon, so any fast responses are going to be worshipped. Thanks!
structure(list(Date = structure(c(48L, 11L, 36L, 35L, 1L, 14L
), .Label = c("01-02-18", "02-03-18", "02-10-18", "03-01-18",
"03-04-18", "03-05-18", "03-08-18", "03-09-18", "05-07-18", "05-12-18",
"07-02-18", "07-06-18", "07-11-18", "08-03-18", "09-01-18", "09-05-18",
"09-08-18", "09-10-18", "10-01-18", "10-04-18", "10-09-18", "11-07-18",
"12-11-18", "12-12-18", "13-02-18", "13-06-18", "14-03-18", "14-09-18",
"15-01-18", "15-05-18", "16-04-18", "16-08-18", "17-07-18", "18-12-18",
"19-01-18", "19-02-18", "19-06-18", "19-10-18", "19-11-18", "20-03-18",
"20-04-18", "20-08-18", "20-09-18", "21-05-18", "23-07-18", "23-11-18",
"24-12-18", "25-01-18", "25-02-18", "25-05-18", "25-06-18", "25-10-18",
"26-03-18", "26-09-18", "27-04-18", "29-08-18", "30-07-18", "31-05-18",
"31-10-18"), class = "factor"), New_Date = structure(list(sec = c(0,
0, 0, 0, 0, 0), min = c(0L, 0L, 0L, 0L, 0L, 0L), hour = c(0L,
0L, 0L, 0L, 0L, 0L), mday = c(25L, 7L, 19L, 19L, 1L, 8L), mon = c(0L,
1L, 1L, 0L, 1L, 2L), year = c(-1882L, -1882L, -1882L, -1882L,
-1882L, -1882L), wday = c(4L, 3L, 1L, 5L, 4L, 4L), yday = c(24L,
37L, 49L, 18L, 31L, 66L), isdst = c(0L, 0L, 0L, 0L, 0L, 0L),
zone = c("LMT", "LMT", "LMT", "LMT", "LMT", "LMT"), gmtoff = c(NA_integer_,
NA_integer_, NA_integer_, NA_integer_, NA_integer_, NA_integer_
)), class = c("POSIXlt", "POSIXt"))), row.names = c(NA, 6L
), class = "data.frame")
EDIT:
Now having this error appear: "'Error in plot.window(...) : need finite 'xlim' values"
Below is my code:
beaches$Date = as.Date(as.character(beaches$Date), '%d-%m-%y')
beaches$New_Date = format(beaches$Date, '%m-%d-%y')
Palm_beach = filter(beaches, Site == "Palm Beach")
Shelly_beach = filter(beaches, Site == "Shelly Beach (Manly)")
plot(Palm_beach$Date, Palm_beach$Enterococci..cfu.100ml., col = "green", main = "Palm Beach vs Shelly Beach", xlab = "Dates", ylab = "Enterococci (cfu)")
points(Shelly_beach$Date, Shelly_beach$Enterococci..cfu.100ml., col = "red")
Try this:
beaches$Date = as.Date(as.character(beaches$Date), '%d-%m-%y')
beaches$New_Date = format(beaches$Date, '%m-%d-%y')
Output:
> head(beaches[, c('Date', 'New_Date')])
Date New_Date
1 2018-01-25 01-25-18
2 2018-02-07 02-07-18
3 2018-02-19 02-19-18
4 2018-01-19 01-19-18
5 2018-02-01 02-01-18
6 2018-03-08 03-08-18
Since neither input nor output are dates it might make more sense to just use regular expresions, rather than converting to and from dates:
beaches$New_Date <- sub("(\\d+)-(\\d+)-(\\d+)", "\\2-\\1-\\3", beaches$Date)
#### OUTPUT ####
Date New_Date
1 25-01-18 01-25-18
2 07-02-18 02-07-18
3 19-02-18 02-19-18
4 19-01-18 01-19-18
5 01-02-18 02-01-18
6 08-03-18 03-08-18
first of all you have to make sure that the original Date column is in character format.
In your data it is in factor format. Then you first have to convert the Date column to a date format and then you can create the New_Date column:
df$Date <- as.Date(as.character(df$Date), format = "%d-%m-%y")
df$New_Date <- format(df$Date, "%m-%d-%Y")
If you only want the last two digits of the year column you can use this instead:
df$New_Date2 <- format(df$Date, "%m-%d-%y")

What is the best way to use agricolae to do ANOVAs on a split plot design?

I'm trying to run some ANOVAs on data from a split plot experiment, ideally using the agricolae package. It's been a while since I've taken a stats class and I wanted to be sure I'm analyzing this data correctly, so I did some searching online and couldn't really find consistency in the way people were analyzing their split plot experiments. What is the best way for me to do this?
Here's the head of my data:
dput(head(rawData))
structure(list(ï..Plot = 2111:2116, Variety = structure(c(5L,
4L, 3L, 6L, 1L, 2L), .Label = c("Burbank", "Hodag", "Lamoka",
"Norkotah", "Silverton", "Snowden"), class = "factor"), Rate = c(4L,
4L, 4L, 4L, 4L, 4L), Rep = c(1L, 1L, 1L, 1L, 1L, 1L), totalTubers = c(594L,
605L, 656L, 729L, 694L, 548L), totalOzNoCulls = c(2544.18, 2382.07,
2140.69, 2401.56, 2440.56, 2503.5), totalCWTacNoCulls = c(461.76867,
432.345705, 388.535235, 435.88314, 442.96164, 454.38525), avgLWratio = c(1.260615419,
1.287949374, 1.111981583, 1.08647584, 1.350686661, 1.107173509
), Hollow = c(14L, 15L, 22L, 25L, 14L, 13L), Double = c(10L,
13L, 15L, 22L, 11L, 9L), Knob = c(86L, 80L, 139L, 156L, 77L,
126L), Researcher = structure(c(1L, 1L, 1L, 1L, 1L, 1L), .Label = "Wang", class = "factor"),
CullsPounds = c(1.75, 1.15, 4.7, 1.85, 0.8, 5.55), CullsOz = c(28,
18.4, 75.2, 29.6, 12.8, 88.8), totalOz = c(2572.18, 2400.47,
2215.89, 2431.16, 2453.36, 2592.3), totalCWTacCulls = c(466.85067,
435.685305, 402.184035, 441.25554, 445.28484, 470.50245)), row.names = c(NA,
6L), class = "data.frame")
For these data, the whole plot is Rate, the split plot is Variety, the block is Rep, and for discussion's sake here, we can look at totalCWTacNoCulls as the response.
Any help would be very much appreciated! I am still getting the hang of Stack Overflow, so if I have made any mistakes or shared my data wrong, please let me know and I'll change it. Thank you!
You can do this using agricolae package as follows
library(agricolae)
attach(rawData)
Rate = factor(Rate)
Variety = factor(Variety)
Rep = factor(Rep)
sp.plot(Rep, Rate, Variety, totalCWTacNoCulls)
Usage according to agricolae package is
sp.plot(block, pplot, splot, Y)
where, block is replications, pplot is main-plot Factor, splot is sub-plot Factor and Y response variable

Rotating y axis labels with mosaic plots WITHOUT overlap

This question is extremely similar to this one yet from another point of view which has not been responded.
Following the proposed code, I am able to generate mosaic plots and rotate the labels so that they are legible. The problem comes when (it seems) the mosaic() function from vcd package does not recognise the rotation and so it does not adapt the graph to fit the labels, yielding results like the following:
Is there any way to change the margins between the labels and the titles? I would be surprised if I am the first one that has encountered this issue. I am open to using other packages to get mosaic graphs if applicable as well.
Code
aux = structure(c(0L, 0L, 3L, 46L, 107L, 14L, 0L, 0L, 4L, 0L, 0L, 2L,
9L, 0L, 23L, 2L, 1L, 3L, 14L, 1L, 8L, 26L, 6L, 11L, 6L, 1L, 6L,
0L, 1L, 1L, 29L, 10L, 62L, 1L, 3L, 1L, 1L, 3L, 1L), .Dim = c(3L,
13L), .Dimnames = list(abcdefghi = c("Madrid", "Valencia", "Granada"
), jklmnopqr = c("roknbjftxcwl", "mfchldbxuyig", "gtyoxeduijpw",
"akbcefymvsiw", "ucbfxplietqk", "mzeykauprfdh", "piermgawyjht",
"chjvatqbylxo", "merhcogjflbd", "wiyrugvmhjlq", "glszdqmjhkov",
"giowaxrtsknm", "pxucytzvljqw")), class = "table")
library(vcd)
colours = c("brown","darkgreen","darkgrey","orange","darkred","gold","blue","red",
"white","pink","purple","navy","lightblue","green","peachpuff","violet","yellow","yellow4")
aux_names = names(attr(aux,"dimnames"))
mosaic(aux,main=paste(aux_names,collapse=" vs. "),
gp=gpar(fill=matrix(sample(colours,max(nrow(aux),ncol(aux))),1,max(nrow(aux),ncol(aux)))),
pop = FALSE,labeling = labeling_border(rot_labels=c(90,0,0,0),
just_labels=c("left","right")))
This code should do what i think you're after.
mosaic(aux,main=paste(aux_names,collapse=" vs. "),
gp=gpar(fill=matrix(sample(colours,max(nrow(aux),ncol(aux))),1,max(nrow(aux),ncol(aux)))),
pop = FALSE,labeling = labeling_border(rot_labels=c(90,0,0,0),
just_labels=c("left","right"),
offset_varnames = c(8,8,8,8)),
margins = c(10, 10, 10, 10))

R Program Vector, record Column Percent

This is my vector
head(sep)
I must find percent of all SEP 11 in each row.
For instance, in first row, percent of SEP 11 is
100 * ((63 + 124)/ (63 + 124 + 0 + 0))
And would like this stored in newly created 8th column
Thanks
dput
> dput(head(sep))
structure(list(Site = structure(1:6, .Label = c("31R001", "31R002",
"31R003", "31R004", "31R005", "31R006", "31R007", "31R008", "31R011",
"31R013", "31R014", "31R016", "31R018", "31R019", "31R020", "31R021",
"31R022", "31R023", "31R024", "31R025", "31R026", "31R027", "31R029",
"31R030", "31R031", "31R032", "31R034", "31R035", "31R036", "31R038",
"31R039", "31R040", "31R041", "31R042", "31R043", "31R044", "31R045",
"31R046", "31R048", "31R049", "31R050", "31R051", "31R052", "31R053",
"31R054", "31R055", "31R056", "31R057", "31R058", "31R059", "31R060",
"31R061", "31R069", "31R071", "31R072", "31R075", "31R435", "31R440",
"31R445", "31R450", "31R455", "31R460", "31R470", "31R600", "31R722",
"31R801", "31R825", "31R826", "31R829", "31R840", "31R843", "31R861",
"31R880"), class = "factor"), Latitude = c(33.808874, 33.877256,
33.820825, 33.852373, 33.829697, 33.810274), Longitude = c(-117.844048,
-117.700135, -117.811845, -117.795516, -117.787532, -117.830429
), Windows.SEP.11 = c(63L, 174L, 11L, 85L, 163L, 71L), Mac.SEP.11 = c(0L,
1L, 4L, 0L, 0L, 50L), Windows.SEP.12 = c(124L, 185L, 9L, 75L,
23L, 5L), Mac.SEP.12 = c(0L, 1L, 32L, 1L, 0L, 50L)), .Names = c("Site",
"Latitude", "Longitude", "Windows.SEP.11", "Mac.SEP.11", "Windows.SEP.12",
"Mac.SEP.12"), row.names = c(NA, 6L), class = "data.frame")
Assuming that you want to get the rowSums of columns that have 'Windows' as column names, we subset the dataset ("sep1") using grep. Then get the rowSums(Sub1), divide by the rowSums of all the numeric columns (sep1[4:7]), multiply by 100, and assign the results to a new column ("newCol")
Sub1 <- sep1[grep("Windows", names(sep1))]
sep1$newCol <- 100*rowSums(Sub1)/rowSums(sep1[4:7])

Resources