R: Recoding Variables Across Multiple Objects

R: Recoding Variables Across Multiple Objects - r

Thank you in advance for your advice. I am trying to create a new variable over multiple objects in a loop. These new variables are generated by a function.
For example, I have three sets of country-level data:
# Generate Example Data
`enter code here`pop <- data.frame(country=c("US","US","CA","CA","FR","FR"),year=c(1,2,1,2,1,2),value=c(290,300,29,30,50,55))
gas <- data.frame(country=c("US","US","CA","CA","FR","FR"),year=c(1,2,1,2,1,2),value=c(3.10,1.80,4.50,2.50,4.50,2.50))
cars <- data.frame(country=c("US","US","CA","CA","FR","FR"),year=c(1,2,1,2,1,2),value=c(2.1,2.2,1.8,1.9,1.3,1.3))
I want to create a new variable, called “countrycode”, using the countrycode() command in the countrycode package.
I would perform the operation on individual objects like this:
library(countrycode)
pop$ccode <- countrycode(pop$country,"iso2c","cown")
pop$id <- (pop$ccode*10000)+pop$year
But I have a large number of objects. I was hoping to do this over a loop, like this
# Create list of variables
vars <- c("pop","gas","cars")
for (i in vars){
i$ccode <- countrycode(country,"iso2c","cown")
i$id <- (i$ccode*10000)+i$year
}
But that doesn’t work. I’ve been trying to do this using assign() in loops and apply(), but I’m too dense to get my head around how to make this work in my case.
If someone could provide me with an example of how to do this with my own type of data, I’d be very grateful.

Would this work for you?
pop <- data.frame(country=c("US","US","CA","CA","FR","FR"),year=c(1,2,1,2,1,2),value=c(290,300,29,30,50,55))
gas <- data.frame(country=c("US","US","CA","CA","FR","FR"),year=c(1,2,1,2,1,2),value=c(3.10,1.80,4.50,2.50,4.50,2.50))
cars <- data.frame(country=c("US","US","CA","CA","FR","FR"),year=c(1,2,1,2,1,2),value=c(2.1,2.2,1.8,1.9,1.3,1.3))
attachCodes <- function(dframe)
{
df <- dframe
df$ccode <- countrycode(df$country,"iso2c","cown")
df$id <- (df$ccode*10000)+df$year
return(df)
}
tablesList <- list(pop,gas,cars)
tablesList <- lapply(tablesList,attachCodes)

Special thanks to #Pawel for supplying the missing information needed to solve the problem. The solution was:
rm(list=ls())
pop <- data.frame(country=c("US","US","CA","CA","FR","FR"),year=c(1,2,1,2,1,2),value=c(290,300,29,30,50,55))
gas <- data.frame(country=c("US","US","CA","CA","FR","FR"),year=c(1,2,1,2,1,2),value=c(3.10,1.80,4.50,2.50,4.50,2.50))
cars <- data.frame(country=c("US","US","CA","CA","FR","FR"),year=c(1,2,1,2,1,2),value=c(2.1,2.2,1.8,1.9,1.3,1.3))
attachCodes <- function(dframe)
{
df <- dframe
df$ccode <- countrycode(df$country,"iso2c","cown")
df$id <- (df$ccode*10000)+df$year
return(df)
}
names <- list("pop","gas","cars")
for(i in names){
assign(i,attachCodes(get(i)))
}

Related

Create dataset to store results in a for loop in R

I am writing to you because I have been for several hours now stuck on a code in R. Initially, I thought it would be something very simple, but nothing I have tried has worked. I am building a code that imports a number of databases and for each of these databases calculates the average ratio of NA, zero values and empty values. The code is built so that it creates an auxiliar database with the variable names from every database and stores the ratio of missing values for each variable. However, the problem is in trying to store that auxiliar database. The idea is that the auxiliar database is stored with the name of the original database, that is to say that it depends on the factor k that iterates according to all the databases. The problem is that I have not been able to do this, all the alternatives to make it something like: base_[k] where k varies according to the name of the database fail.
Have any of you experienced something like this, I don't know what to do anymore. Thanks a lot. I leave the code so you can understand it a little better.
rm(list = ls())
setwd("C:/Users/Kevin/Escritorio/UK 2022.05.24")
listcsv <- dir(pattern = "*.csv") # creates the list of all the csv files in the directory
results <- as.data.frame(listcsv)
results$mean_na_ratio <- -777
results$mean_zero_ratio <- -777
results$mean_no_value_ratio <- -777
for (k in 1:length(listcsv)){
df <- read.csv(listcsv[k],stringsAsFactors=FALSE)
c1 <- colMeans(is.na(df))
results[k, "mean_na_ratio"] <- mean(c1)
vars_vector <- colnames(df)
vars_dataframe <- as.data.frame(vars_vector)
rownames(vars_dataframe) <- vars_dataframe$vars_vector
for (i in vars_vector){
df[,i] <- as.character(df[,i])
df$temp <- df[,i]
vars_dataframe[i, "mean_zero_ratio"] <- nrow(subset(df, temp=="0"))/nrow(df)
vars_dataframe[i, "mean_no_value_ratio"] <- nrow(subset(df, temp==""))/nrow(df)
}
vars_dataframe[is.na(vars_dataframe)] <- 0
results[k, "mean_zero_ratio"] <- mean(vars_dataframe$mean_zero_ratio)
results[k, "mean_no_value_ratio"] <- mean(vars_dataframe$mean_no_value_ratio)
**data_k <- vars_dataframe**
}
The problem is marked in bold
Thank you so much.

Put brackets around variables that end with sd

I have a large table and I would like to put brackets around every variable that ends with "_sd".
Here is an example:
a<- c(0,2,3,4,10,7,6,5,4,3)
b_sd<-c(0,2,3,4,8,6,5,4,3,1)
c<- c(0,2,3,4,10,7,6,5,4,3)
d_sd<-c(0,2,3,4,8,6,5,4,3,1)
dta <- data.frame(a=a, b_sd=b_sd, c=c, d_sd=d_sd)
dta
# this is the slow way:
dta[,2] <- paste0("(", dta[,2], ")")
dta[,4] <- paste0("(", dta[,4], ")")
# this is what I want:
dta
The above code will work, but it's very slow for all the variables that I have. How can I automate it? 1. find the variables that end with _sd and put brackets around them?
Thank you.

You can do
namesWithSd <- grep("_sd",names(dta))
dta[namesWithSd] <- lapply(dta[namesWithSd], function(colVals) {
paste0("(",colVals,")")
})

If your dataset is large, try the data.table package for operations like these. Here is a vignette if you want to know more.
Here is the code utilizing the data.table package :
library(data.table)
##Set as data table
setDT(dta)
##Select the relevant variables
sd_names<-grep("_sd",names(dta),value = T)
dta[,(sd_names):=lapply(.SD,function(x) {paste0("(",x,")")}),.SDcols=sd_names]
###
dta

Apply function to all dataframes

I work with SAS files (sas7bdat = dataframes) and SAS formats (sas7bcat).
My sas7bdat files are in a "data" file, so I can get a list in object files_names.
Here is the first part of my code, working perfectly
files_names <- list.files(here("data"))
nb_files <- length(files_names)
data_names <- vector("list",length=nb_files)
for (i in 1 : nb_files) {
data_names[i] <- strsplit(files_names[i], split=".sas7bdat")
}
for (i in 1:nb_files) {
assign(data_names[[i]],
read_sas(paste(here("data", files_names[i])), "formats/formats.sas7bcat")
)}
but I get some issues when trying to apply function as_factor from package haven (in order to apply labels on my new dataframes and get like SEX = "Male" instead of SEX = 1).
I can make it work dataframe by dataframe like the code below
df_labelled <- haven::as_factor(df, only_labelled = TRUE)
I would like to create a loop but didn't work because my data_names[i] isn't a dataframe and as_factor requires a dataframe in first argument.
I'm quite new to R, thank you very much if someone could help me.

you might want to think about using different data structures, for example you can use a named list to save your dataframes then you can easily loop through them.
In fact you could do everything in one loop, I'm sure there's a more efficient way to do this, but here's an example of one way without changing your code too much :
files_names <- list.files(here("data"))
raw_dfs <- list()
labelled_dfs <- list()
for (file_name in files_names) {
# # strsplit returns a list either extract the first element
# # like this
# df_name <- (strsplit(file_name, split=".sas7bdat"))[[1]]
# # or use something else like gsub
df_name <- gsub(".sas7bdat", '', file_name)
raw_dfs[df_name] <- read_sas(paste(here("data", file_name)), "formats/formats.sas7bcat")
labelled_dfs[df_name] <- haven::as_factor(raw_dfs[[df_name]], only_labelled = TRUE)
}

Reading nodes from multiple html and storing result as a vector

I have a list of locally saved html files. I want to extract multiple nodes from each html and save the results in a vector. Afterwards, I would like to combine them in a dataframe. Now, I have a piece of code for 1 node, which works (see below), but it seems quite long and inefficient if I apply it for ~ 20 variables. Also, something really strange with the saving to vector (XXX_name) it starts with the last observation and then continues with the first, second, .... Do you have any suggestions for simplifying the code/ making it more efficient?
# Extracts name variable and stores in a vector
XXX_name <- c()
for (i in 1:216) {
XXX_name <- c(XXX_name, name)
mydata <- read_html(files[i], encoding = "latin-1")
reads_name <- html_nodes(mydata, 'h1')
name <- html_text(reads_name)
#print(i)
#print(name)
}
Many thanks!

You can put the workings inside a function then apply that function to each of your variables with map
First, create the function:
read_names <- function(var, node) {
mydata <- read_html(files[var], encoding = "latin-1")
reads_name <- html_nodes(mydata, node)
name <- html_text(reads_name)
}
Then we create a df with all possible combinations of inputs and apply the function to that
library(tidyverse)
inputs <- crossing(var = 1:216, node = vector_of_nodes)
output <- map2(inputs$var, inputs$node, read_names)

R Script Function not returning data

I've been searching for a while now and can't seem to come up with an answer. I'm just creating a simple function for some statistical data that I'm pulling from a list and manipulating it and create averages and whatnot. The function isn't returning anything though. No errors are being produced and the matrix is being created.
Source:
library(matrixStats)
source("Control_Function.R")
mydata <- read.table("DataSmall.txt")
length <- nrow(mydata)
a<-length/11
#begining control limits
control(mydata)
Control_Function.R
control <- function(arg1){
mat1 <-matrix(unlist(arg1),11,25)
matAverage <-colMeans(mat1)
matSdAv <- colSds(mat1)
sbar <-mean(matSdAv)
xbarbar<-mean(matAverage)
newlist<-list(matAverage, matSdAv, sbar, xbarbar)
return(newlist)
}
Any help would be greatly appreciated.
Thanks

If you just want to see the answers, then you need to add "print" like this:
control <- function(arg1){
mat1 <-matrix(unlist(arg1),11,25)
print(matAverage <-colMeans(mat1))
print(matSdAv <- colSds(mat1))
print(sbar <-mean(matSdAv))
print(xbarbar<-mean(matAverage))
newlist<-list(matAverage, matSdAv, sbar, xbarbar)
return(newlist)
}
If you want to actually save one of those objects to your workspace (not the function environment), then add a <<- like this to the object you want to save (I made the object I wanted to add to the workspace "newlist"):
control <- function(arg1){
mat1 <-matrix(unlist(arg1),11,25)
print(matAverage <-colMeans(mat1))
print(matSdAv <- colSds(mat1))
print(matSdAv)
print(sbar <-mean(matSdAv))
print(xbarbar<-mean(matAverage))
newlist <<- list(matAverage, matSdAv, sbar, xbarbar)
}

Develop Reference

r css asp.net wordpress firebase qt symfony nginx http apache-flex

R: Recoding Variables Across Multiple Objects - r

Related

Create dataset to store results in a for loop in R

Put brackets around variables that end with sd

Apply function to all dataframes

Reading nodes from multiple html and storing result as a vector

R Script Function not returning data

Categories

Resources