Write Loop To Perform Function through Column Names - r

I have a dataset with a quantitative column for which I want to calculate the mean based on groups. The other columns in the dataset are titled [FY2001,FY2002,...,FY2018]. These columns are populated with either a 1 or 0.
I want to calculate the mean of the first column for each of the FY columns when they equal 1. So, I want 18 different means.
I am used to using macros in SAS where I can replace parts of a dataset name or column name using a let statement. This is my attempt at writing a loop in R to solve this problem:
vector = c("01","02","03","04","05","06","07","08","09","10",
"11","12","13","14","15","16","17","18")
varlist = paste("FY20", vector, sep = "")
abc = for (i in length(varlist)){
table(ALL_FY2$paste(varlist)[i])
}
abc
This doesn't work since it treats the paste function as a column. What am I missing? Any help would be appreciated.

We can use [[ instead of & to subset the column. In addition, 'abc' should be a list which gets assigned with the corresponding table output of each column in the for loop.
abc <- vector("list", length(varlist)) # initialize a `list` object
Loop through the sequence of 'varlist' and not the length(varlist) (it is a single number)
for(i in seq_along(varlist)) abc[[i]] <- table(ALL_FY2[[varlist[i]]])
However, if we need to have a single table output from all the columns mentioned in the 'varlist', unlist the columns to a vector and replicate the sequence of columns before applying the table
ind <- rep(seq_along(varlist), each = nrow(ALL_FY2))
table(ind, unlist(ALL_FY2[varlist]))

Related

How to create a matrix/data frame from a high number of single objects by using a loop?

I have a high number of single objects each one containing a mean value for a year. They are called cddmean1950, cddmean1951, ... ,cddmean2019.
Now I would like to put them together into a matrix or data frame with the first column being the year (1950 - 2019) and the second column being the single mean values.
This is a very long way to do it without looping:
matrix <- rbind(cddmean1950,cddmean1951,cddmean1952,...,cddmean2019)
Afterwards you transform the matrix to a data frame, create a vector with the years and add it to the data frame.
I am sure there must be a smarter and faster way to do this by using a loop or anything else?
I think this could be an easy way to do it. Provided all those single objects are in your current environment.
First we would create a list of the variable names using the paste0 function
YearRange <- 1950:2019
ObjectName <- paste0("cddmean", YearRange)
Then we can use lapply and get to get the values of all these variables as a list.
Then using do.call and rbind we can rbind all these values into a single vector and then finally create your dataframe as you requested.
ListofSingleObjects <- lapply(ObjectName, get)
MeanValues <- do.call(rbind,ListofSingleObjects)
df <- data.frame( year = YearRange , Mean = MeanValues )

Assigning name to rows in R

I would like to assign names to rows in R but so far I have only found ways to assign names to columns. My data is in two columns where the first column (geo) is assigned with the name of the specific location I'm investigating and the second column (skada) is the observed value at that specific location. To clarify, I want to be able to assign names for every location instead of just having them all in one .txt file so that the data is easier to work with. Anyone with more experience than me that knows how to handle this in R?
First you need to import the data to your global environment. Try the function read.table()
To name rows, try
(assuming your data.frame is named df):
rownames(df) <- df[, "geo"]
df <- df[, -1]
Well, your question is not that clear...
I assume you are trying to create a data.frame with named rows. If you look at the data.frame help you can see the parameter row.names description
NULL or a single integer or character string specifying a column to be used as row names, or a character or integer vector giving the row names for the data frame.
which means you can manually specify the row names when you create the data.frame or the column containing the names. The former can be achived as follows
d = data.frame(x=rnorm(10), # 10 random data normally distributed
y=rnorm(10), # 10 random data normally distributed
row.names=letters[1:10] # take the first 10 letters and use them as row header
)
while the latter is
d = data.frame(x=rnorm(10), # 10 random data normally distributed
y=rnorm(10), # 10 random data normally distributed
r=letters[1:10], # take the first 10 letters
row.names=3 # the column with the row headers is the 3rd
)
If you are reading the data from a file I will assume you are using the command read.table. Many of its parameters are the same of data.frame, in particular you will find that the row.headers parameter works the same way:
a vector of row names. This can be a vector giving the actual row names, or a single number giving the column of the table which contains the row names, or character string giving the name of the table column containing the row names.
Finally, if you have already read the data.frame and you want to change the row names, Pierre's answer is your solution

loop all tables in directory and import specific row based on string R

So I have to import some data into R and find it reasonably difficult.
I have multiple similar tables in a directory and would like to make a script looks for specific row (based on string not raw number) and add them to a new table.
Example data:
Table one:
name Johnny
registeration data 01012001
userid>= 47
table two:
name Jimmy
registeration data 02052005
userid>= 1972
What I want is a table that contains:
userid>= 47
userid>= 1972
Note: separated by tap..
What I tried to do is the following:
A: Create a list of files in the working directory:
list = (Sys.glob("*.table"))
B: created one a table using lappy:
table <- lapply(list, function(x) read.table(x, header = FALSE, sep = "\t", fill = TRUE))
C. Tried to grep the word "userid" (failed):
table[grep("userid", rownames("userid")), ]
Error: incorrect number of dimensions
Is there a 'simpler way' to fitch row of interest (userid>= in the example) based on string without relying on external packages? I can also think about using "grep userid *.table > newtable" in bash, but I want to use only R.
How about this (given that the rownames are in the first variable as in your example):
# list all tables in current directory, optionally recursively
tbls <- list.files(getwd(),'.table$') # if more dirs, maybe add recursive = TRUE
# create a list of tables
tbls_r <- lapply(tbls,function(x) read.table(x,header=FALSE,sep='\t',stringAsFactors = FALSE))
# using lapply to extract the row of interest
tbls_r <- lapply(tbls_r, function(x) x[x[,1] == 'userid>=',])
With using lapply, we apply a function to each element of the list (in this case each table). x references a single table, so with x[,1] == 'userid>=' I'm creating a logical vector (TRUE and FALSE) to see which values of the first column (indexed by x[,1] - note I'm leaving the first position empty as I want to index all rows but only the first column) are equal to the desired string.
I then use this logical vector right away, to index the table itself, returning only the rows which have a corresponding TRUE value in the vector).
# Bind the resulting rows to a single table
result_table <- do.call(rbind,tbls_r)
Hope that clears it up.
Edit:
If you just want to extract the values you can use this:
tbls_r <- sapply(tbls_r, function(x) x[x[,1] == 'userid>=',2])
In this case, I'm specifying during indexing, that I only want the column 2, leaving me with only the values.
Also I'm using sapply instead of lapply, which already returns a handy vector instead of a list. So no need for calling do.call.
If you then want a data.frame, just go with something along the lines of
res <- data.frame(UID = tbls_r,stringsAsFactors=FALSE)
Of course you can add more variables to this data.frame given they have the same length.

Find a Value with the row number and store it into a variable

I am new in R and probably this is an easy question:
I have the following vector:
P <- c(23,45,98)
These values represent the numbers of rows
Now, I have a table with only one column and I would like to obtain the values on each row from the previous vector and return it into 3 different objects (Variables).
e.g. The row #23 has the value P05.14 and for this first value of the vector "P" I want to create a variable or object like: A = P05.14. The same with the other two values of that vector.
Thanks for your help.
If you only have the three values, just do it manually:
A <- dat[23,]
B <- dat[45,]
C <- dat[98,]
For more values, you can assign them in a loop:
for(value in P){
assign(paste0("A",value), as.character(dat[value,]))
}
I should note that in a situation such as this, it would be best to use a list, and not litter the workspace with variable. But to each their own. Good luck!

Changing values of multiple column elements for dataframe in R

I'm trying to update a bunch of columns by adding and subtracting SD to each value of the column. The SD is for the given column.
The below is the reproducible code that I came up with, but I feel this is not the most efficient way to do it. Could someone suggest me a better way to do this?
Essentially, there are 20 rows and 9 columns.I just need two separate dataframes one that has values for each column adjusted by adding SD of that column and the other by subtracting SD from each value of the column.
##Example
##data frame containing 9 columns and 20 rows
Hi<-data.frame(replicate(9,sample(0:20,20,rep=TRUE)))
##Standard Deviation calcualted for each row and stored in an object - i don't what this objcet is -vector, list, dataframe ?
Hi_SD<-apply(Hi,2,sd)
#data frame converted to matrix to allow addition of SD to each value
Hi_Matrix<-as.matrix(Hi,rownames.force=FALSE)
#a new object created that will store values(original+1SD) for each variable
Hi_SDValues<-NULL
#variable re-created -contains sum of first column of matrix and first element of list. I have only done this for 2 columns for the purposes of this example. however, all columns would need to be recreated
Hi_SDValues$X1<-Hi_Matrix[,1]+Hi_SD[1]
Hi_SDValues$X2<-Hi_Matrix[,2]+Hi_SD[2]
#convert the object back to a dataframe
Hi_SDValues<-as.data.frame(Hi_SDValues)
##Repeat for one SD less
Hi_SDValues_Less<-NULL
Hi_SDValues_Less$X1<-Hi_Matrix[,1]-Hi_SD[1]
Hi_SDValues_Less$X2<-Hi_Matrix[,2]-Hi_SD[2]
Hi_SDValues_Less<-as.data.frame(Hi_SDValues_Less)
This is a job for sweep (type ?sweep in R for the documentation)
Hi <- data.frame(replicate(9,sample(0:20,20,rep=TRUE)))
Hi_SD <- apply(Hi,2,sd)
Hi_SD_subtracted <- sweep(Hi, 2, Hi_SD)
You don't need to convert the dataframe to a matrix in order to add the SD
Hi<-data.frame(replicate(9,sample(0:20,20,rep=TRUE)))
Hi_SD<-apply(Hi,2,sd) # Hi_SD is a named numeric vector
Hi_SDValues<-Hi # Creating a new dataframe that we will add the SDs to
# Loop through all columns (there are many ways to do this)
for (i in 1:9){
Hi_SDValues[,i]<-Hi_SDValues[,i]+Hi_SD[i]
}
# Do pretty much the same thing for the next dataframe
Hi_SDValues_Less <- Hi
for (i in 1:9){
Hi_SDValues[,i]<-Hi_SDValues[,i]-Hi_SD[i]
}

Resources