How to access data saved in an assign construct? - r

I made a list, read the list into a for loop, do some calculations with it and export a modified dataframe to [1] "IAEA_C2_NoStdConditionResiduals1" [2] "IAEA_C2_EAstdResiduals2" ect. When I do View(IAEA_C2_NoStdConditionResiduals1) after the for loop then I get the following error message in the console: Error in print(IAEA_C2_NoStdConditionResiduals1) : object 'IAEA_C2_NoStdConditionResiduals1' not found, but I know it is there because RStudio tells me in its Environment view. So the question is: How can I access the saved data (in this assign construct) for further usage?
ResidualList = list(IAEA_C2_NoStdCondition = IAEA_C2_NoStdCondition,
IAEA_C2_EAstd = IAEA_C2_EAstd,
IAEA_C2_STstd = IAEA_C2_STstd,
IAEA_C2_Bothstd = IAEA_C2_Bothstd,
TIRI_I_NoStdCondition = TIRI_I_NoStdCondition,
TIRI_I_EAstd = TIRI_I_EAstd,
TIRI_I_STstd = TIRI_I_STstd,
TIRI_I_Bothstd = TIRI_I_Bothstd
)
C = 8
for(j in 1:C) {
#convert list Variable to string for later usage as Variable Name as unique identifier!!
SubNameString = names(ResidualList)[j]
SubNameString = paste0(SubNameString, "Residuals")
#print(SubNameString)
LoopVar = ResidualList[[j]]
LoopVar[ ,"F_corrected_normed"] = round(LoopVar[ ,"F_corrected_normed"] / mean(LoopVar[ ,"F_corrected_normed"]),
digit = 5
)
LoopVar[ ,"F_corrected_normed_error"] = round(LoopVar[ ,"F_corrected_normed_error"] / mean(LoopVar[ ,"F_corrected_normed_error"]),
digit = 5
)
assign(paste(SubNameString, j), LoopVar)
}
View(IAEA_C2_NoStdConditionResiduals1)

Not really a problem with assign and more with behavior of the paste function. This will build a variable name with a space in it:
assign(paste(SubNameString, j), LoopVar)
#simple example
> assign(paste("v", 1), "test")
> `v 1`
[1] "test"
,,,, so you need to get its value by putting backticks around its name so the space is not misinterpreted as a parse-able delimiter. See what happens when you type:
`IAEA_C2_NoStdCondition 1`
... and from here forward, use paste0 to avoid this problem.

Related

Issue scraping flairs from Reddit: Arguments imply differing number of rows?

I'm trying to scrape a subreddit using a RedditExtractoR library that I've modified, but I keep running into this error. I modified the get_thread_content.R file in the source code so that it looks like this:
#' Get thread contents of Reddit URLs
#'
#' This function takes a collection of URLs and returns a list with 2 data frames:
#' 1. a data frame containing meta data describing each thread
#' 2. a data frame with comments found in all threads
#'
#' The URLs are being retained in both tables which would allow you to join them if needed
#'
#' #param urls A vector of strings pointing to a Reddit thread
#' #return A list with 2 data frames "threads" and "comments"
#' #export
get_thread_content <- function(urls){
data <- lapply(urls, parse_thread_url)
list(
threads = lapply(data, function(z) z[["thread"]]) |> rbind_list(),
comments = lapply(data, function(z) z[["comments"]]) |> remove_na() |> rbind_list()
)
}
# Build a data frame with thread attributes of interest
build_thread_content_df <- function(json, request_url) {
if (is.null(json$link_flair_text))
{thread_flair_info <- 'no flair'}
else {thread_flair_info <- json$link_flair_text}
if (is.null(json$author_flair_text))
{author_flair_info <- 'no flair'}
else {author_flair_info <- json$author_flair_text}
df <- data.frame(
url = strip_json(request_url),
author = json$author,
author_flair_text = author_flair_info,
date = timestamp_to_date(json$created_utc),
timestamp = json$created_utc,
title = json$title,
text = json$selftext,
thread_flair = thread_flair_info,
subreddit = json$subreddit,
score = json$score,
upvotes = json$ups,
downvotes = json$downs,
up_ratio = json$upvote_ratio,
total_awards_received = json$total_awards_received,
golds = json$gilded,
cross_posts = json$num_crossposts,
comments = json$num_comments,
stringsAsFactors = FALSE
)
return(df)
}
nullfix <- function(x){
if(is.null(x))
{x <- "no flair"}
else {x}
}
# Build a data frame with comments and their attributes.
build_comments_content_df <- function(json, request_url) {
data.frame(
url = strip_json(request_url),
author = extract_comments_attributes(json, "author"),
comment_author_flair = nullfix(extract_comments_attributes(json, "author_flair_text")),
date = extract_comments_attributes(json, "created_utc") |> timestamp_to_date(),
timestamp = extract_comments_attributes(json, "created_utc"),
score = extract_comments_attributes(json, "score"),
upvotes = extract_comments_attributes(json, "ups"),
downvotes = extract_comments_attributes(json, "downs"),
golds = extract_comments_attributes(json, "gilded"),
comment = extract_comments_attributes(json, "body"),
comment_id = build_comment_ids(json),
stringsAsFactors = FALSE
)
}
Everything works except the comment_author_flair part. I initially tried using similar code to what I used for the get_thread_content function, but that failed, so I separated out the function (see: nullfix) and tried to apply it to the basic code, but that still isn't working, as I get the same error.
As a reproducable example, after modifying that library:
thread <- get_thread_content("https://www.reddit.com/r/SSBM/comments/10ys20y/who_were_the_least_clutch_players/")
I would expect to get 69 flair values with this code, one for each comment, with NULL values replaced by "no flair" because of the nullfix function. Instead I get the error:
Error in data.frame(url = strip_json(request_url), author = extract_comments_attributes(json, :
arguments imply differing number of rows: 1, 69, 29

Using an R function to hash values produces a repeating value across rows

I'm using the following query:
let
Source = {1..5},
#"Converted to Table" = Table.FromList(Source, Splitter.SplitByNothing(), {"Numbers"}, null, ExtraValues.Error),
#"Added Custom" = Table.AddColumn(#"Converted to Table", "Letters", each Character.FromNumber([Numbers] + 64)),
#"Run R script" = R.Execute("# 'dataset' holds the input data for this script#(lf)#(lf)library(""digest"")#(lf)#(lf)dataset$SuffixedLetters <- paste(dataset$Letters, ""_suffix"")#(lf)dataset$HashedLetters <- digest(dataset$Letters, ""md5"", serialize = TRUE)#(lf)output<-dataset",[dataset=#"Added Custom"]),
output = #"Run R script"{[Name="output"]}[Value]
in
output
which leads to the resulting table:
And the here is the R script with better formatting:
# 'dataset' holds the input data for this script
library("digest")
dataset$SuffixedLetters <- paste(dataset$Letters, "_suffix")
dataset$HashedLetters <- digest(dataset$Letters, "md5", serialize = TRUE)
output<-dataset
The 'paste' function appears to iterate over rows and resolve on each row with the new input. But the 'digest' function only appears to return the first value in the table across all rows.
I don't know why the behavior of the two functions would seem to operate differently. Can anyone advise how to get the 'HashedLetters' column to resolve using the values from each row instead of just the initial one?
Use:
dataset$HashedLetters <- sapply(dataset$Letters, digest, algo = "md5", serialize = TRUE)
digest works on a whole object at a time, not individual elements of a vector.
vec <- letters[1:3]
digest::digest(vec, algo="md5", serialize=TRUE)
# [1] "38ce1fe9e19a222505e693e8bdd8aeec"
sapply(vec, digest::digest, algo="md5", serialize=TRUE)
# a b c
# "127a2ec00989b9f7faf671ed470be7f8" "ddf100612805359cd81fdc5ce3b9fbba" "6e7a8c1c098e8817e3df3fd1b21149d1"

Unable to do homog.test in a loop

I have a problem looping through columns with this command. When printing "i", the variable name appears, but it does not substitute it in the formula. The error suggested that I can't use a variable. Any suggestions?
for (i in colnames(NMDStrokeHx)[3:14]){
print(i)
print(homog.test(i ~ AM25, data = NMDStrokeHx, method = "Levene"))
}
output:
[1] "ANCOWATO"
Error in homog.test(i ~ AM25, data = NMDStrokeHx, method = "Levene") :
The name of response variable does not match the variable names in the data.
these are the column names of the data:
> colnames(NMDStrokeHx)[3:14]
[1] "ANCOWATO" "ANMSETOT" "ANAFTOT" "ANBNTTOT" "ANDELCOR" "ANWM2TOT" "ANFULVR1" "ANVRTCOR" "ANTMASEC"
[10] "ANTMBSEC" "ANSDMTOT" "ADCDRSTG"
You can use reformulate/as.formula to create a formula object.
for (i in colnames(NMDStrokeHx)[3:14]){
print(i)
print(homog.test(reformulate('AM25', i), data = NMDStrokeHx,method = "Levene"))
}

Avoiding for loop, Naming Example

I would like to avoid using for loop in following example. Goal is to repeat string vector multiple times with different second part which changes each repetition. Is that possible?
str2D = mtcars
Vector = c(10,20)
Dimen = dim( str2D )
nn = c()
for ( i in Dimen[2]*(1:length(Vector)) ){
nn[ (i+1-Dimen[2]): i ] = rep(paste("|d",Vector[i/Dimen[2]],sep=""), Dimen[2] )
}
Name = paste( rep(names(str2D) , length(Vector) ),nn,sep="")
Correct result for "Name" vector is following:
"mpg|d10" "cyl|d10" "disp|d10" "hp|d10" "drat|d10" "wt|d10" "qsec|d10" "vs|d10" "am|d10" "gear|d10" "carb|d10" "mpg|d20" "cyl|d20" "disp|d20" "hp|d20" "drat|d20" "wt|d20" "qsec|d20" "vs|d20" "am|d20" "gear|d20" "carb|d20"
Thank you
I don't quite understand the end goal here but at least this achieves your desired output without a loop:
Name <- paste0(paste(names(mtcars)), "|d", rep(1:2, each = length(names(mtcars))), "0")
> Name
[1] "mpg|d10" "cyl|d10" "disp|d10" "hp|d10" "drat|d10" "wt|d10" "qsec|d10"
[8] "vs|d10" "am|d10" "gear|d10" "carb|d10" "mpg|d20" "cyl|d20" "disp|d20"
[15] "hp|d20" "drat|d20" "wt|d20" "qsec|d20" "vs|d20" "am|d20" "gear|d20"
[22] "carb|d20"

Ordering Merged data frames

As a fairly new R programmer I seem to have run into a strange problem - probably my inexperience with R
After reading and merging successive files into a single data frame, I find that order does not sort the data as expected.
I have multiple references in each file but each file refers to measurement data obtained at a different time.
Here's the code
library(reshape)
# Enter file name to Read & Save data
FileName=readline("Enter File name:\n")
# Find first occurance of file
for ( round1 in 1 : 6) {
ReadFile=paste(round1,"C_",FileName,"_Stats.csv", sep="")
if (file.exists(ReadFile))
break
}
x = data.frame(read.csv(ReadFile, header=TRUE),rnd=round1)
for ( round2 in (round1+1) : 6) {
#
ReadFile=paste(round2,"C_",FileName,"_Stats.csv", sep="")
if (file.exists(ReadFile)) {
y = data.frame(read.csv(ReadFile, header=TRUE),rnd = round2)
if (round2 == (round1 +1))
z=data.frame(merge(x,y,all=TRUE))
z=data.frame(merge(y,z,all=TRUE))
}
}
ordered = order(z$lab_id)
results = z[ordered,]
res = data.frame( lab=results[,"lab_id"],bw=results[,"ZBW"],wi=results[,"ZWI"],pf_zbw=0,pf_zwi=0,r = results[,"rnd"])
#
# Establish no of samples recorded
nsmpls = length(res[,c("lab")])
# Evaluate Z_scores for Between Lab Results
for ( i in 1 : nsmpls) {
if (res[i,"bw"] > 3 | res[i,"bw"] < -3)
res[i,"pf_zbw"]=1
}
# Evaluate Z_scores for Within Lab Results
for ( i in 1 : nsmpls) {
if (res[i,"wi"] > 3 | res[i,"wi"] < -3)
res[i,"pf_zwi"]=1
}
dd = melt(res, id=c("lab","r"), "pf_zbw")
b = cast(dd, lab ~ r)
If anyone could see why the ordering only works for about 55 of 70 records and could steer me in the right direction I would be obliged
Thanks very much
Check whether z$lab_id is a factor (with is.factor(z$lab_id)).
If it is, try
z$lab_id <- as.character(z$lab_id)
if it is supposed to be a character vector; or
z$lab_id <- as.numeric(as.character(z$lab_id))
if it is supposed to be a numeric vector.
Then order it again.
Ps. I had previously put these in the comments.

Resources