Put brackets around variables that end with sd - r

I have a large table and I would like to put brackets around every variable that ends with "_sd".
Here is an example:
a<- c(0,2,3,4,10,7,6,5,4,3)
b_sd<-c(0,2,3,4,8,6,5,4,3,1)
c<- c(0,2,3,4,10,7,6,5,4,3)
d_sd<-c(0,2,3,4,8,6,5,4,3,1)
dta <- data.frame(a=a, b_sd=b_sd, c=c, d_sd=d_sd)
dta
# this is the slow way:
dta[,2] <- paste0("(", dta[,2], ")")
dta[,4] <- paste0("(", dta[,4], ")")
# this is what I want:
dta
The above code will work, but it's very slow for all the variables that I have. How can I automate it? 1. find the variables that end with _sd and put brackets around them?
Thank you.

You can do
namesWithSd <- grep("_sd",names(dta))
dta[namesWithSd] <- lapply(dta[namesWithSd], function(colVals) {
paste0("(",colVals,")")
})

If your dataset is large, try the data.table package for operations like these. Here is a vignette if you want to know more.
Here is the code utilizing the data.table package :
library(data.table)
##Set as data table
setDT(dta)
##Select the relevant variables
sd_names<-grep("_sd",names(dta),value = T)
dta[,(sd_names):=lapply(.SD,function(x) {paste0("(",x,")")}),.SDcols=sd_names]
###
dta

Related

How to create a string that can be used as LHS and assigned a value to?

I feel stupid for asking such a simple question, but I am hitting my head in the wall.
Why does the paste0() create a string that cannot be not interpreted as name for an empty object ? Is there a different way of create the LHS that would be better?
As input I have a dataframe. As an output I want to have a new filtered dataframe. This works fine as long as I manually type all the code. However, I am trying to reduce repetition, and therefore I want to create a function that does the same thing, but then it is not working anymore.
library(magrittr)
df <- data.frame(
var_a = round(runif(20), digits = 1),
var_b = sample(letters, 20)
)
### Find duplicates
df$duplicate_num <- duplicated(df$var_a)
df$duplicate_txt <- duplicated(df$var_b)
df # a check
### Create two lists of duplicates
list_of_duplicate_num <-
df %>%
filter(duplicate_num)
list_of_duplicate_num # a check
list_of_duplicate_txt <-
df %>%
filter(duplicate_txt)
list_of_duplicate_txt # a check '
So far everything works as expected.
I would like to simplify the code and make this to a function that takes the arguments "num" or "txt". But I am having problems with creating the LHS.
The below should, in my mind, do the same as the code above.
paste0("list_of_duplicate_", "num") <-
df %>%
filter(duplicate_num)
I do get an error message:
Error in paste0("list_of_duplicate_", "num") <- df %>%
filter(duplicate_num) :
target of assignment expands to non-language object
My goal is to create a function with something like this:
make_list_of_duplicates <- function(criteria = "num") {
paste0("list_of_duplicate_", criteria) <-
df %>%
filter(paste0("duplicate_", criteria))
paste0("list_of_duplicate_", criteria) # a check
}
### Create two lists of duplicates
make_list_of_duplicates("num")
make_list_of_duplicates("txt")
and then continue with some joins etc.
I have been looking to tidy evaluation, assignments, rlang::enexpr(), base::substitute(), get(), mget() and many other things, but after two day of reading and trial and error, I am convinced that there must be a an other direction to look at that I am not seeing.
I am running MS Open R 4.0.2.
I am grateful for any suggestions.
Sincerely,
Eero
I found the solution to my question, when I understood that it was a case of indirection. Because I was on a wrong track, I created lots of complications and made it more difficult than necessary. Thanks to #r2evans who pointed me in the right direction. I have in the mean time decided that I will use loops, instead of functions, but here is the working function:
## Example of using paste inside a function to refer to an object.
library(magrittr)
library(dplyr)
df <- data.frame(
var_a = round(runif(20), digits = 1),
var_b = sample(letters, 20)
)
# Find duplicates
df$duplicate_num <- duplicated(df$var_a)
df$duplicate_txt <- duplicated(df$var_b)
# SEE https://dplyr.tidyverse.org/articles/programming.html#indirection-2
make_list_of_duplicates_f2 <- function(criteria = "num") {
df %>%
filter(.data[[paste0("duplicate_", {{criteria}})]])
}
# Create two lists of duplicates
list_of_duplicates_f2_num <-
make_list_of_duplicates_f2("num")
list_of_duplicates_f2_txt <-
make_list_of_duplicates_f2("txt")

Apply function to all dataframes

I work with SAS files (sas7bdat = dataframes) and SAS formats (sas7bcat).
My sas7bdat files are in a "data" file, so I can get a list in object files_names.
Here is the first part of my code, working perfectly
files_names <- list.files(here("data"))
nb_files <- length(files_names)
data_names <- vector("list",length=nb_files)
for (i in 1 : nb_files) {
data_names[i] <- strsplit(files_names[i], split=".sas7bdat")
}
for (i in 1:nb_files) {
assign(data_names[[i]],
read_sas(paste(here("data", files_names[i])), "formats/formats.sas7bcat")
)}
but I get some issues when trying to apply function as_factor from package haven (in order to apply labels on my new dataframes and get like SEX = "Male" instead of SEX = 1).
I can make it work dataframe by dataframe like the code below
df_labelled <- haven::as_factor(df, only_labelled = TRUE)
I would like to create a loop but didn't work because my data_names[i] isn't a dataframe and as_factor requires a dataframe in first argument.
I'm quite new to R, thank you very much if someone could help me.
you might want to think about using different data structures, for example you can use a named list to save your dataframes then you can easily loop through them.
In fact you could do everything in one loop, I'm sure there's a more efficient way to do this, but here's an example of one way without changing your code too much :
files_names <- list.files(here("data"))
raw_dfs <- list()
labelled_dfs <- list()
for (file_name in files_names) {
# # strsplit returns a list either extract the first element
# # like this
# df_name <- (strsplit(file_name, split=".sas7bdat"))[[1]]
# # or use something else like gsub
df_name <- gsub(".sas7bdat", '', file_name)
raw_dfs[df_name] <- read_sas(paste(here("data", file_name)), "formats/formats.sas7bcat")
labelled_dfs[df_name] <- haven::as_factor(raw_dfs[[df_name]], only_labelled = TRUE)
}

R: Recoding Variables Across Multiple Objects

Thank you in advance for your advice. I am trying to create a new variable over multiple objects in a loop. These new variables are generated by a function.
For example, I have three sets of country-level data:
# Generate Example Data
`enter code here`pop <- data.frame(country=c("US","US","CA","CA","FR","FR"),year=c(1,2,1,2,1,2),value=c(290,300,29,30,50,55))
gas <- data.frame(country=c("US","US","CA","CA","FR","FR"),year=c(1,2,1,2,1,2),value=c(3.10,1.80,4.50,2.50,4.50,2.50))
cars <- data.frame(country=c("US","US","CA","CA","FR","FR"),year=c(1,2,1,2,1,2),value=c(2.1,2.2,1.8,1.9,1.3,1.3))
I want to create a new variable, called “countrycode”, using the countrycode() command in the countrycode package.
I would perform the operation on individual objects like this:
library(countrycode)
pop$ccode <- countrycode(pop$country,"iso2c","cown")
pop$id <- (pop$ccode*10000)+pop$year
But I have a large number of objects. I was hoping to do this over a loop, like this
# Create list of variables
vars <- c("pop","gas","cars")
for (i in vars){
i$ccode <- countrycode(country,"iso2c","cown")
i$id <- (i$ccode*10000)+i$year
}
But that doesn’t work. I’ve been trying to do this using assign() in loops and apply(), but I’m too dense to get my head around how to make this work in my case.
If someone could provide me with an example of how to do this with my own type of data, I’d be very grateful.
Would this work for you?
pop <- data.frame(country=c("US","US","CA","CA","FR","FR"),year=c(1,2,1,2,1,2),value=c(290,300,29,30,50,55))
gas <- data.frame(country=c("US","US","CA","CA","FR","FR"),year=c(1,2,1,2,1,2),value=c(3.10,1.80,4.50,2.50,4.50,2.50))
cars <- data.frame(country=c("US","US","CA","CA","FR","FR"),year=c(1,2,1,2,1,2),value=c(2.1,2.2,1.8,1.9,1.3,1.3))
attachCodes <- function(dframe)
{
df <- dframe
df$ccode <- countrycode(df$country,"iso2c","cown")
df$id <- (df$ccode*10000)+df$year
return(df)
}
tablesList <- list(pop,gas,cars)
tablesList <- lapply(tablesList,attachCodes)
Special thanks to #Pawel for supplying the missing information needed to solve the problem. The solution was:
rm(list=ls())
pop <- data.frame(country=c("US","US","CA","CA","FR","FR"),year=c(1,2,1,2,1,2),value=c(290,300,29,30,50,55))
gas <- data.frame(country=c("US","US","CA","CA","FR","FR"),year=c(1,2,1,2,1,2),value=c(3.10,1.80,4.50,2.50,4.50,2.50))
cars <- data.frame(country=c("US","US","CA","CA","FR","FR"),year=c(1,2,1,2,1,2),value=c(2.1,2.2,1.8,1.9,1.3,1.3))
attachCodes <- function(dframe)
{
df <- dframe
df$ccode <- countrycode(df$country,"iso2c","cown")
df$id <- (df$ccode*10000)+df$year
return(df)
}
names <- list("pop","gas","cars")
for(i in names){
assign(i,attachCodes(get(i)))
}

merge tables in Loop using R

I have a simple question regarding a loop that I wrote. I want to access different files in different directories and extract data from these files and combine into one table. My problem is that my loop is not adding the results of the different files but only updating with the species that is currently in the loop. Here it is my code:
for(i in 1:length(splist.par))
{
results<-read.csv(paste(getwd(),"/ResultsR10arcabiotic/",splist.par[i],"/","maxentResults.csv",sep=""),h=T)
species <- splist.par[i]
AUC <- results$Test.AUC[1:10]
AUC_SD <- results$AUC.Standard.Deviation[1:10]
Variable <- "a"
Resolution <- "10arc"
table <-cbind(species,AUC,AUC_SD,Variable,Resolution)
}
This is probably an easy question but I am not an experienced programmer. Thanks for the attention
Gabriel
I'd use lapply to get the desired data from each file and add the Species information, and then combine with rbind. Something like this (untested):
do.call(rbind, lapply(splist.par, function(x) {
d <- read.csv(file.path("ResultsR10arcabiotic", x, "maxentResults.csv"))
d <- d[1:10, c("Test.AIC", "AIC.Standard.Deviation")]
names(d) <- c("AUC", "AUC_SD")
cbind(Species=x, d, stringsAsFactors=FALSE)
}))
#Aaron's lapply answer is good, and clean. But to debug your code: you put a bunch of data into table but overwrite table every time. You need to do
table <-cbind(table, species,AUC,AUC_SD,Variable,Resolution)
BTW, since table is a function in R, I'd avoid using it as a variable name. Imagine:
table(table)
:-)

Loop over string variables in R

When programming in Stata I often find myself using the loop index in the programming. For example, I'll loop over a list of the variables nominalprice and realprice:
local list = "nominalprice realprice"
foreach i of local list {
summarize `i'
twoway (scatter `i' time)
graph export "C:\TimePlot-`i'.png"
}
This will plot the time series of nominal and real prices and export one graph called TimePlot-nominalprice.png and another called TimePlot-realprice.png.
In R the method I've come up with to do the same thing would be:
clist <- c("nominalprice", "realprice")
for (i in clist) {
e <- paste("png(\"c:/TimePlot-",i,".png\")", sep="")
eval(parse(text=e))
plot(time, eval(parse(text=i)))
dev.off()
}
This R code looks unintuitive and messy to me and I haven't found a good way to do this sort of thing in R yet. Maybe I'm just not thinking about the problem the right way? Can you suggest a better way to loop using strings?
As other people have intimated, this would be easier if you had a dataframe with columns named nominalprice and realprice. If you do not, you could always use get. You shouldn't need parse at all here.
clist <- c("nominalprice", "realprice")
for (i in clist) {
png(paste("c:/TimePlot-",i,".png"), sep="")
plot(time, get(i))
dev.off()
}
If your main issue is the need to type eval(parse(text=i)) instead of ``i'`, you could create a simpler-to-use functions for evaluating expressions from strings:
e = function(expr) eval(parse(text=expr))
Then the R example could be simplified to:
clist <- c("nominalprice", "realprice")
for (i in clist) {
png(paste("c:/TimePlot-", i, ".png", sep=""))
plot(time, e(i))
dev.off()
}
Using ggplot2 and reshape:
library(ggplot2)
library(reshape)
df <- data.frame(nominalprice=rexp(10), time=1:10)
df <- transform(df, realprice=nominalprice*runif(10,.9,1.1))
dfm <- melt(df, id.var=c("time"))
qplot(time, value, facets=~variable, data=dfm)
I don't see what's especially wrong with your original solution, except that I don't know why you're using the eval() function. That doesn't seem necessary to me.
You can also use an apply function, such as lapply. Here's a working example. I created dummy data as a zoo() time series (this isn't necessary, but since you're working with time series data anyway):
# x <- some time series data
time <- as.Date("2003-02-01") + c(1, 3, 7, 9, 14) - 1
x <- zoo(data.frame(nominalprice=rnorm(5),realprice=rnorm(5)), time)
lapply(c("nominalprice", "realprice"), function(c.name, x) {
png(paste("c:/TimePlot-", c.name, ".png", sep=""))
plot(x[,c.name], main=c.name)
dev.off()
}, x=x)

Resources