Error recognising variable in dataset after attaching it - r

The following is a segment of code that I need to perform so that in the later stages I can perform other functions to make histograms of the dataset. My problem is that even after correctly importing the dataset, it does not recognize the "Aboard" Variable.
This clearly shows that the variable exists and that the dataset has been imported properly but whenever i try and run the chunks the second chunk comes up with an error saying that it does not recognise the variable. I have also tried to do this with another variable and it comes up with the same error. I do not know why it is happening and if it it because i have missed out a step. I tried to fix this by putting the linecolnames(before) in but it did no good.
setwd("~/Uni Y2/Stats/Group Project")
Before <- read.csv("before_911_no_summary.csv",header = TRUE)
After <- read.csv("after_911_no_summary.csv",header = TRUE)
colnames(Before) <-c("Date","Location","Aboard","Fatalities","Ground",
"Total.dead")
colnames(After) <-c("Date1","Location1","Aboard1","Fatalities1","Ground1",
"Total.dead1")
```
```{r}
Survivors<- (Aboard-Fatalities)
Survivors1<- (Aboard1-Fatalities1)
```

Related

RStudio not showing output in Console when using a variable?

Currently learning RStudio but for some reason my Console isn't outputting my code? Not sure if there was a setting I may have toggled but it seems to only affect me when I have a variable in my code?
For example:
when working the ToothGrowth dataset
data("ToothGrowth")
View(ToothGrowth)
This executes and I can view the table of the data in a separate tab.
However, when I try to filter it
data("ToothGrowth")
View(ToothGrowth)
filtered_tg = filter(ToothGrowth, dose=0.5)
View(filtered_tg)
Nothing Returns, a formatted table doesn't open and an empty line is returned in the console window.
This is just one example, even when i try something as simple as
number = 10
number
I would expect, in return;
[1] 10
But console is empty.
Looking for solutions online, I've seen maybe i didn't close a bracket or parenthesis and to include CloseAllConnections() if there was a '+' showing in the console (which there isn't).
Working on rstudio cloud so is there a reset somewhere that I could possibly try?
Thanks for any and all help!
I changed the function as follows:
filtered_tg = filter(ToothGrowth, dose=="0.5")
View(filtered_tg)
After running these commands it was viewed as the table of the data in a separate tab 👇
Output

Disabling R diagnostics for Rmarkdown

Is there a way to disable R diagnostics for RMD file?
I'm running a simple preliminary code here and getting these warnings "Unknown or uninitialized column: xxx".
library(readr)
dataset = read_csv("breast_cancer.csv")
is.factor(dataset$cellularity)
is.factor(dataset$cancer_type_detailed)
A quick google search tells me these warnings are common with no way to fix it. At least I would like to know how to disable these in my RMD file. Also don't want to have to write in warnings=FALSE for every R chunk.
Link to dataset:
https://www.kaggle.com/raghadalharbi/breast-cancer-gene-expression-profiles-metabric
It's usually a bad idea to suppress warnings: very few of them are false positives. Suppressing them globally is a really bad idea.
In this case, the warnings signal that your dataset does not have columns with those names. From the str(dataset) information, we can see that there is a column named Cellularity and one named Cancer Type Detailed. Those don't match the names you used in your code, cellularity and cancer_type_detailed.
If you get a warning like this, you should examine the dataset to see what the names really are. One way is to print str(dataset); even better is to print names(dataset), which will show very clearly exactly what the names are.
There's one situation where you might not want to see that warning. If you are not sure if a column is included, you might use a test like
if (is.null(dataset$notthere)) { ... }
You'll get the warning from this code when dataset is a tibble. The way to avoid it is to use
if (is.null(dataset[["notthere"]])) ...
or
if (! "notthere" %in% names(dataset)) ...
You can globally hide warnings like this:
```{r setup, include = FALSE}
knitr::opts_chunk$set(warning = FALSE)
```
You add this chunk in the beginning of your Rmarkdown file and warnings are not shown anymore. You can also add other sommon chunk options there if you like.

psidR: build.panel() returns duplicate Error

Just tried to run the build.panel function within the psidR package. It did download all the rda files successfully for the first part of the script and I put them into a separate folder. However, now that I run the function I get an error code :
Error in [.data.table(yind, , :=((as.character(ind.nas)), NA)) :
Can't assign to the same column twice in the same query (duplicates detected).
In addition: Warning message:
In [.data.table(tmp, , :=((nanames), NA_real_), with = FALSE) :
with=FALSE ignored, it isn't needed when using :=. See ?':=' for examples.
Might be my fault of ill-defining my variables? I just use the getNamesPSID function and plug it into a data.table, similar to the example code:
library(psidR)
library(openxlsx)
library(data.table)
cwf <- read.xlsx("http://psidonline.isr.umich.edu/help/xyr/psid.xlsx")
id.ind.educ1 <- getNamesPSID("ER30010", cwf)
id.fam.income1 <- getNamesPSID("V81", cwf)
famvars1 <- data.table(year=c(1968, 1969, 1970),
income1=id.fam.income1
)
indvars1 <- data.table(year=c(1968, 1969, 1970),
educ1=id.ind.educ1
)
build.panel(datadir = "/Users/Adrian/Documents/ECON 490/Heteroskedastic Dependency/Dependency/RDA", fam.vars = famvars1, ind.vars = indvars1, sample = "SRC", design = 3)
If you omit the datadir argument, R will download the corresponding datasets to a temporary directory. It will be printed in the output where exactly. As long as the R process runs you should have access to it and can copy it elsewhere. Error should be reproducible. Might take a bit until it downloads the datasets the first time.
If it relates to the NA's within each getNames argument, is there a workaround where I still preserve the corresponding year so I can tell that apart in my panel?
I know there was a similar issue on the corresponding github page relating to zips with the same name as one of the data sets. However, my folder only contains the correct datasets and no zips.
I also tried to exclude the NA cases but that messed up the length of my vectors. I also tried it with a standard data.frame.
I also checked my resulting famvars / indvars dataframes for duplicates with Excel but there are none besides the NA's, which, according to the github example found on https://github.com/floswald/psidR should be included in the dataset...
Thanks so much for your help :)
EDIT: here the traceback():
3: [.data.table(yind, , :=((as.character(ind.nas)), NA))
2: yind[, :=((as.character(ind.nas)), NA)]
1: build.panel(datadir = "/Users/Adrian/Documents/ECON 490/Heteroskedastic Dependency/Dependency/RDA",
fam.vars = famvars, ind.vars = indvars, sample = "SRC", design = 3)
EDIT'': thank you #Axeman, I cut down the reproducible example. My actual data.table contains many more variables.
UPDATE:
Just for anyone running into a similar issue:
After trying to find a way to get the function to work I decided to instead manually merge all the files and dataframes. Be prepared, its a mammoth project but so is any analysis of the PSID. I followed the instructions found here: http://asdfree.com/panel-study-of-income-dynamics-psid.html and combined them with helper function of the psidR package (getNamesPSID mainly, to get the variable names in each wave). So far, very successful. Only wish that there were more articles on the exact functioning of the survey package on the web.

Suppress output, keep pander and plot in R markdown

I have the following code in my Rmd file:
library(randomForestSRC)
library(ggRandomForests)
rf_all <- rfsrc(Y ~ ., data=df, block.size=1, ntree=100, importance=TRUE)
plot(gg_vimp(vimp.rfsrc(rf_all))) + theme(legend.position = "none")
rf_select <- var.select.rfsrc(rf_all)
pander(rf_select$varselect)
confmat <- confusionMatrix (rf_all$class.oob, data$Enddiagnosegruppe)
pander(confmat$table)
I'm trying to create an HTML report, but I cannot for the life of me figure out what chunk options to use such that:
The output of the rfsrc functions is suppressed.
All the plots appear.
The calls to pander yield properly formatted output.
I have tried pretty much all combinations of chunk options for message, warnings, error, as well as wrapping parts of my code in invisible(), capture.output(), as well as playing around with panderOptions('knitr.auto.asis', FALSE). Nothing seems to work, either the messages are not suppressed, pander tables look weird, empty section headers appear out of nowhere (I'm guessing "##" is inserted somewhere), no luck. I feel like I'm missing the forest for the trees here. Not to mention that this code is supposed to be wrapped in a loop that generates different formulae. Any suggestions on how to get this to work?
Using the following works:
rf_select <- var.select.rfsrc(rf_all, verbose=FALSE)
cat("\n")
pander(rf_select$varselect)
The problem is that var.select.rfsrc() uses cat() instead of message(), and somehow adding a newline is necessary since otherwise the first pander table is run together with the previous line in the markdown file.

R Shiny - dataset load in a first chunk doesn't exist in a second chunk ...?

I have a strange error in a shiny app I built with the library learnr. An error "Object not found" about an object I just loaded and just visualized (meaning the object exists no ?)
Although I don't have a reproducible example, some of you will maybe understand what is creating the error :
I have a first chunk {r load} that loads a dataset. There is no error here, I can even visualize the dataset (screenshot below)
Then I have a second chunk, where I would like to manipulate the dataset. But it tells me dataset doesn't exist ! How it could be possible, I just visualized it one chunk before ?! ...
I don't understand how a dataset could be exists in a chunk, and not in another. Does it mean the dataset isn't loaded in the global environment ? Is it a problem with the learnr library ?
Maybe someone will have an idea, or something I could test. Thank you in advance.
EDIT:
The problem is about the environment/workspace. In the first chunk, even if I load the dataset, it is not store in the environment. I tested the function ls() in a second chunk, and it tells me there is no object in the workspace. The loaded dataset is not here, I don't know why ...
In my opinion, shiny doesn't store any data. You have to pass it manually from one chunk to other as follow (only adding the code snippet from server):
server <- function(input, output, session) {
output$heat <- renderPlotly({
Name<-c("John","Bob","Jack")
Number<-c(3,3,5)
Count<-c(2,2,1)
NN<-data.frame(Name,Number,Count)
render_value(NN) # You need function otherwise data.frame NN is not visible
# You can consider this as chunk 1
})
render_value=function(NN){
# Here your loaded data is available
head(NN)
# You can consider this as chunk 2
})
}
}
shinyApp(ui, server)
You can find full code here: Subset a dataframe based on plotly click event
OR
Create global.R file as suggested here and follow this URL: R Shiny - create global data frame at start of app

Resources