R: referencing a character string - r

Is it possible to feed R a character string and for it to know I'm looking for the data frame with that name?
Example:
TestData <- matrix(1:100,nrow=10, ncol=10)
Then, if I want to reference it later, can I do something similar to this to have R pull dataset?
paste("TestData$",x[1,],sep="")
When entered this way, it comes up as a character string and obviously returns no data. For context, I'm trying to do it this way because I'm creating a loop that goes through several data sets (and columns within those datasets), but does similar operations, so I would like to be able to dynamically change the referenced dataset.
Any help is appreciated. Thanks!

Related

Accessing individual dataframes from a split function in R

I'm new to R am trying reorganise my data based on the sampleID
I've used the split() function in R which has does exactly what I wanted it to and stored my information in new data.frames
My question is now that they are in separate data.frames how do I access them individually for further processing?
My code goes as follows
splitList.list = list()
for (i in 1:31)
{
splitList.list[[i]] = split(chromList.list[[i]], chromList.list[[i]]$sampleID)
}
splitList.list[[1]]
I take the files I have (31 files), split them and store them in a list. This much works. I get an output that looks like this
This can be repeated with any of list elements and work. I now what to do some processing on separately on each data.frame but don't know how to access just one of these. Please help

Changing data type in data frame

I have some data tables in Excel spreadsheets which I am using in R. Some of the tables store numbers as text i.e. numeric values are stored as characters.
To clarify, it is not a formatting that is a problem but numbers themselves. The Excel (and R) sees such numbers as characters such as letters, rather then numbers.
Because formatting seems to be an issue, addStyle function in openxlsx did not work for me.
After some googling, I've decided to try and write a for loop that will check each value individually.I wrote a nested for loop that checks each value and overwrites it if it is a number (code is below).This seems to work logically but values do not get overwritten i.e. values that were stored as text are still there.
library(readxl)
library(openxlsx)
wb<-loadWorkbook(choose.files())
data0<-as.data.frame(read_excel(choose.files(),sheet=1,range = "B1:E1131"))
data<-data0
for(i in 1:ncol(data)){
for(j in 1:nrow(data)){
if(is.numeric(as.numeric(data[j,i]))&&!is.na(as.numeric(data[j,i]))){
data[j,i]<-as.numeric(data[j,i])
}
}
}
Desired outcome:
I would like to change data in column "Expenses" (in a picture below) to data in a column to its right via R.
coming from my comment:
You can use the col_types-argument in the readxl::read_excel()-function to force reading of text/numeric/date/... data

Name new dataframes from character vectors - loop

I think this one is easy but I still can't figure it out and I really need help with this. I've looked everywhere but still couldn't find it.
Let's say I have this vector:
filenames <- c("fn1", "fn2", "fn3")
And I want to associate them with an dataframe that is created according to a function, that is generated at that time
df|name from filenames[i]| <- df
so it would return these dataframes
dffn1
dffn2
dffn3
I hope I made myself clear. My problem is create a new data frame and name it according to a list or whatever, in a for loop.
You can use assign to achieve what you want.
for(nms in filenames){
assign(paste('df',nms,sep=''), df) }

convert period in stata to NA in r

I have a dataset in stata and I want to take it to R, but there are some missing values in state and they are represented using a period. I want to get the data into R which I do by loading the foreign package and then I use read.table() function. How do I convert the periods in state which are genuinely missing to NA in R?
If i understand you correctly, you first load the Foreign-Package for loading a .dta-File, correct?
library("foreign")
Then you would read in your Data by using:
myRFile <- read.dta(file="someStataFile.dta")
You are asking for a way that the missing operator from Stata, often denoted by a dot ., is converted to the missing operator in R, NA, also correct?
One thing to know here is, that Stata handles missing values "behind the scenes" in multiple ways. There are actually about 27 different missing operators in Stata, which are usually not distinguishable for the user. You do not need to know them for you problem though, because read.dta() handles them itself.
To learn how you can tackle a simple problem like this yourself in the future, you always need to check the help file for your function first:
help(read.dta)
Here you see, that the function handles the extensive missing-data types from Stata automatically and correctly.
If you want to have information about which type of missing operator was recognized, you can set the argument missing.type=TRUE, by using:
myRFile <- read.dta(file="someStataFile.dta", missing.type=TRUE)
Then, according to the help file, the following will happen:
If missing.type is TRUE a separate list is created with the same
variable names as the loaded data. For string variables the list value
is NULL. For other variables the value is NA where the observation is
not missing and 0–26 when the observation is missing. This is attached
as the "missing" attribute of the returned value.

Strangeness with filtering in R and showing summary of filtered data

I have a data frame loaded using the CSV Library in R, like
mySheet <- read.csv("Table.csv", sep=";")
I now can print a summary on that mySheet object
summary(mySheet)
and it will show me a summary for each column, for example, one column named Diagnose has the unique values RCM, UCM, HCM and it shows the number of occurences of each of these values.
I now filter by a diagnose, like
subSheet <- mySheet[mySheet$Diagnose=='UCM',]
which seems to be working, when I just type subSheet in the console it will print only the rows where the value has been matched with 'UCM'
However, if I do a summary on that subSheet, like
summary(subSheet)
it still 'knows' about the other two possibilities RCM and HCM and prints those having a value of 0. However, I expected that the new created object will NOT know about the possible values of the original mySheet I initially loaded.
Is there any way to get rid of those other possible values after filtering? I also tried subset but this one just seems to be some kind of shortcut to '[' for the interactive mode... I also tried DROP=TRUE as option, but this one didn't change the game.
Totally mind squeezing :D Any help is highly appreciated!
What you are dealing with here are factors from reading the csv file. You can get subSheet to forget the missing factors with
subSheet$Diagnose <- droplevels(subSheet$Diagnose)
or
subSheet$Diagnose <- subSheet$Diagnose[ , drop=TRUE]
just before you do summary(subSheet).
Personally I dislike factors, as they cause me too many problems, and I only convert strings to factors when I really need to. So I would have started with something like
mySheet <- read.csv("Table.csv", sep=";", stringsAsFactors=FALSE)

Resources