I'm relatively new to R and really new to creating functions, but need to make some for others who don't use r to be able to use easily. I'm looking for a function that will allow me to reference a the top cell, or row of a column I've searched pretty hard but everything I've found seems to return errors. The function I need will subtract cells from the same column as the reference but moving down the column while the reference remains fixed. Code so far is:
df <- tibble(depth.along.core = 1:5,
Age.cal.BP = 1:5,
AFB = 1:5,
assumed.C = rep(0.5, 5))
Mega_Bog <- function(data) {
require(dplyr)
data %>% mutate(PCAR=((lead(depth.along.core) - depth.along.core )/(lead(Age.cal.BP) - Age.cal.BP))*AFBD*assumed.C*10000,
PCA_NCP = PCAR*(lead(Age.cal.BP)-Age.cal.BP),
PCA_NCP[is.na(PCA_NCP)] <- 0,
CCP_Bottom_Up = rev(cumsum(rev(PCA_NCP)))
CCP_Top_Down = CCP_Bottom_Up##Need cell reference here## -CCP_Bottom_Up)}
I know the is.na is a bit weird but it works, although I'm not sure why. The output I'm after will look like:
CCP_Top_Down <- c(0,-1,-2,-3,-4)
Thanks people
Related
I feel stupid for asking such a simple question, but I am hitting my head in the wall.
Why does the paste0() create a string that cannot be not interpreted as name for an empty object ? Is there a different way of create the LHS that would be better?
As input I have a dataframe. As an output I want to have a new filtered dataframe. This works fine as long as I manually type all the code. However, I am trying to reduce repetition, and therefore I want to create a function that does the same thing, but then it is not working anymore.
library(magrittr)
df <- data.frame(
var_a = round(runif(20), digits = 1),
var_b = sample(letters, 20)
)
### Find duplicates
df$duplicate_num <- duplicated(df$var_a)
df$duplicate_txt <- duplicated(df$var_b)
df # a check
### Create two lists of duplicates
list_of_duplicate_num <-
df %>%
filter(duplicate_num)
list_of_duplicate_num # a check
list_of_duplicate_txt <-
df %>%
filter(duplicate_txt)
list_of_duplicate_txt # a check '
So far everything works as expected.
I would like to simplify the code and make this to a function that takes the arguments "num" or "txt". But I am having problems with creating the LHS.
The below should, in my mind, do the same as the code above.
paste0("list_of_duplicate_", "num") <-
df %>%
filter(duplicate_num)
I do get an error message:
Error in paste0("list_of_duplicate_", "num") <- df %>%
filter(duplicate_num) :
target of assignment expands to non-language object
My goal is to create a function with something like this:
make_list_of_duplicates <- function(criteria = "num") {
paste0("list_of_duplicate_", criteria) <-
df %>%
filter(paste0("duplicate_", criteria))
paste0("list_of_duplicate_", criteria) # a check
}
### Create two lists of duplicates
make_list_of_duplicates("num")
make_list_of_duplicates("txt")
and then continue with some joins etc.
I have been looking to tidy evaluation, assignments, rlang::enexpr(), base::substitute(), get(), mget() and many other things, but after two day of reading and trial and error, I am convinced that there must be a an other direction to look at that I am not seeing.
I am running MS Open R 4.0.2.
I am grateful for any suggestions.
Sincerely,
Eero
I found the solution to my question, when I understood that it was a case of indirection. Because I was on a wrong track, I created lots of complications and made it more difficult than necessary. Thanks to #r2evans who pointed me in the right direction. I have in the mean time decided that I will use loops, instead of functions, but here is the working function:
## Example of using paste inside a function to refer to an object.
library(magrittr)
library(dplyr)
df <- data.frame(
var_a = round(runif(20), digits = 1),
var_b = sample(letters, 20)
)
# Find duplicates
df$duplicate_num <- duplicated(df$var_a)
df$duplicate_txt <- duplicated(df$var_b)
# SEE https://dplyr.tidyverse.org/articles/programming.html#indirection-2
make_list_of_duplicates_f2 <- function(criteria = "num") {
df %>%
filter(.data[[paste0("duplicate_", {{criteria}})]])
}
# Create two lists of duplicates
list_of_duplicates_f2_num <-
make_list_of_duplicates_f2("num")
list_of_duplicates_f2_txt <-
make_list_of_duplicates_f2("txt")
How can I create an image from a data frame? For example:
library(tidyverse)
library(gridExtra)
df = iris %>% slice(1:4)
I've the following but:
1. I haven't been able to get this to be saved to a variable. It just pops up in the plots pane of Rstudio. Am I missing something obvious? I'd like to be able to have this plot referenced to a variable so I could save it as a png or something.
2. Is there a way to remove the row numbers that seem to appear?
3. This look is fine, but is there a way to make it more of a lighter background compared to what this is?
gridExtra::grid.table(df)
To save a relevant variable, use
myTable <- tableGrob(df)
since
grid.table
# function (...)
# grid.draw(tableGrob(...))
# <bytecode: 0x10758c078>
# <environment: namespace:gridExtra>
Given that, you can run
library(grid)
grid.draw(myTable)
For that you want
myTable <- tableGrob(df, rows = NULL)
See ?tableGrob and particularly ttheme_default (its source code makes pretty clear what are possible parameters; see also here). For instance,
myTable <- tableGrob(
df,
rows = NULL,
theme = ttheme_default(core = list(bg_params = list(fill = "grey99")))
)
grid.draw(myTable)
When I run this Loop I can print the results and I want to create a data frame with this data but I cant. Until now I have this:
filenames <- list.files(path=getwd())
numfiles <- length(filenames)
for (i in 1:numfiles) {
file <- read.table(filenames[i],header = TRUE)
ts = subset(file, file$name == "plantNutrientUptake")
tss = subset (ts, ts$path == "//plants/nitrate")
tssc = tss[,2:3]
d40 = tssc[41,2]
print(d40)
print(filenames[i])
}
This is not the most efficient way to do this, but it takes advantage of what code you've already written. First, you'll create an empty data frame with the columns you want, but filled with NA. Then, in each iteration of the loop, you'll fill one row of the data frame.
filenames <- list.files(path=getwd())
numfiles <- length(filenames)
# Create an empty data.frame
df <- data.frame(filename = rep(NA, numfiles), d40 = rep(NA, numfiles))
for (i in 1:numfiles){
file <- read.table(filenames[i],header = TRUE)
ts = subset(file, file$name == "plantNutrientUptake")
tss = subset (ts, ts$path == "//plants/nitrate")
tssc = tss[,2:3]
d40 = tssc[41,2]
# Fill row i of the data frame
df[i,"filename"] = filenames[i]
df[i,"d40"] = d40
}
Hope that does it! Good luck :)
There are a lot of ways to do what you are asking. Also, without a reproducible example it is difficult to validate that code will run. I couldn't tell what type of data was in each of your variable so I just guessed that they were mostly characters with one numeric. You'll need to change the code if that's not true.
The following method is using base R (no other packages). It builds off of what you have done. There are other ways to do this using map, do.call, or apply. But it's important to be able to run through a loop.
As someone commented, your code is just re-writing itself every loop. Luckily you have the variable i that you can use to specify where things go.
filenames <- list.files(path=getwd())
numfiles <- length(filenames)
# Declare an empty dataframe for efficiency purposes
df <- data.frame(
ts = rep(NA_character_,numfiles),
tss = rep(NA_character_,numfiles),
tssc = rep(NA_character_,numfiles),
d40 = rep(NA_real_,numfiles),
stringsAsFactors = FALSE
)
# Loop through the files and fill in the data
for (i in 1:numfiles){
file <- read.table(filenames[i],header = TRUE)
df$ts[i] <- subset(file, file$name == "plantNutrientUptake")
df$tss[i] <- subset (ts, ts$path == "//plants/nitrate")
df$tssc[i] <- tss[,2:3]
df$d40[i] <- tssc[41,2]
print(d40)
print(filenames[i])
}
You'll notice a few things about this code that are extra.
First, I'm declaring the variable type for each column explicitly. You can use rep(NA,numfiles) but that leave R to guess what the column should be. This may not be a problem for you if all of your variables are obviously of the same type. But imagine you have a variable a = c("1","A","B") of all characters. R will go through the first iteration of the loop and guess that the column is numeric. Then on the second run of the loop will crash when it runs into a character.
Next, I'm declaring the entire dataframe before entering the loop. When people tell you that loops in [modern] R are slow it is often because you are re-allocating memory every loop. By declaring the entire dataframe up front you speed up the loop significantly. This also allows you to reference any cell in the dataframe...which is exactly what you want to do in the loop.
Finally, I'm using the $ syntax to make things clear. Writing df[i,"d40"] <- d40 is the same as writing df$d40[i] <- d40. I just think it is clear to use the second method. This is a matter of personal preference.
Hey all I apologise if this is very simple and mediocre but I can't seem to create a function that turns a time variable (00:00:00) into a numeric AND creates a new column to put the result in.
I can turn the time into a numeric, I just cannot complete the 'new column' part. Any help is appreciated.
Time <- function(x) {
begin <- x$avg..wait.time
x$Num.wait.time <- as.numeric(as.POSIXct(strptime(begin, "%H:%M:%S")))
}
(NOTE: avg..wait.time is the time cell and
Num.wait.time is the new variable/column I want to create)
If your purpose is not in writing the function per se, with dplyr you can directly tackle the problem with existing wheels, and not have to write a separate function.
library(dplyr)
df <- data.frame(avg.wait.time = c("01:02:03", "03:02:01"))
df <- df %>%
dplyr::mutate(
avg.wait.numeric = as.numeric(as.POSIXct(strptime(avg.wait.time, "%H:%M:%S")))
)
If you wish to write a separate function, I would do as follows:
Time <- function(x,
input_var = "avg.wait.time",
output_var = "avg.wait.numeric") {
x[[output_var]] <-
as.numeric(as.POSIXct(strptime(x[[input_var]], "%H:%M:%S")))
return(x)
}
This allows the input variable name and output variable name to be specified, currently set with some arbitrary default values (you can kick these out, of course).
I have a question that I can't seem to find the answer anywhere online. I apologize if it's already been answered, but here goes. I've written a script in R that will go through the process of forecasting for me, and returning the best point forecast based on cross validation and other criteria. I'm wanting to save this script as a function, that way I don't have to use the full script every time I go to forecast. The basic set up of my script is the following:
output <- read.csv("C:/Users/data.csv", header = T)
colnames(output)
month_count = length(output[,1]) ##used in calculations throughout code
current_year = output[1,1]
current_month = output[1,2]
months = 5 #months to forecast out
m = 0
data <- ts(output[,3][c(1:(month_count-m))],
frequency = 12, start = c(current_year,current_month))
#runs all the other steps from here on
The function that I'm writing will looking like this where it takes various inputs and then runs the script and prints back my forecasts
forecastMe = function(sourcefile,months,m)
{
#runs the data prints out the result
}
The problem I'm having is I want to be able to enter a directory and file name such as C:/Users/documents/data1.csv into the function (for the sourcefile part) and for it pick that up at this step of my R script.
output <- read.csv("C:/Users/sourcefile.csv", header = T)
I can't seem to find a way to get it to do it right. Any ideas or suggestions?
So...
function(sourcefile, etc) {
output <- read.csv(sourcefile, header = T)
etc
}
...that? I don't really see what you're asking exactly.
You were almost there. All you have to do is replace your constants with the variable names you want to pass to the function and delete your declarations you don't need anymore.
forecastMe = function(sourcefile,months,m) {
output <- read.csv(sourcefile, header = T)
colnames(output)
month_count = length(output[,1]) ##used in calculations throughout code
current_year = output[1,1]
current_month = output[1,2]
data <- ts(output[,3][c(1:(month_count-m))],
frequency = 12, start = c(current_year,current_month))
#runs all the other steps from here on
}