Dataframe reading problems - r

So, I have a DataFrame generated by the following block:
url <- "http://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data"
adult <- read.csv(url ,strip.white = TRUE ,header = FALSE )
colnames( adult ) <- c("age"," workclass "," final weight ","education "," education -num"," martial - status ","
occupation "," relationship "," race ","sex"," capital-gain "," capital - loss ","hours -per - week ","native -
country ","income")
The values in the "income" column are either "<=50k" or ">50k". when I try to select the people with income ">50k", I use the following comand:
richs = adult[adult["income"] == ">50k",]
however, the richs DataFrame is always empty. What am I doing wrong?
thanks.

First, I will download the data into a data frame with strings as factors:
>adults <- read.csv('http://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data', header = FALSE)
> str(adults)
'data.frame': 32561 obs. of 15 variables:
$ V1 : int 39 50 38 53 28 37 49 52 31 42 ...
$ V2 : Factor w/ 9 levels " ?"," Federal-gov",..: 8 7 5 5 5 5 5 7 5 5 ...
$ V3 : int 77516 83311 215646 234721 338409 284582 160187 209642 45781 159449 ...
$ V4 : Factor w/ 16 levels " 10th"," 11th",..: 10 10 12 2 10 13 7 12 13 10 ...
$ V5 : int 13 13 9 7 13 14 5 9 14 13 ...
$ V6 : Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 5 3 1 3 3 3 4 3 5 3 ...
$ V7 : Factor w/ 15 levels " ?"," Adm-clerical",..: 2 5 7 7 11 5 9 5 11 5 ...
$ V8 : Factor w/ 6 levels " Husband"," Not-in-family",..: 2 1 2 1 6 6 2 1 2 1 ...
$ V9 : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 5 3 3 5 3 5 5 5 ...
$ V10: Factor w/ 2 levels " Female"," Male": 2 2 2 2 1 1 1 2 1 2 ...
$ V11: int 2174 0 0 0 0 0 0 0 14084 5178 ...
$ V12: int 0 0 0 0 0 0 0 0 0 0 ...
$ V13: int 40 13 40 40 40 40 16 45 50 40 ...
$ V14: Factor w/ 42 levels " ?"," Cambodia",..: 40 40 40 40 6 40 24 40 40 40 ...
$ V15: Factor w/ 2 levels " <=50K"," >50K": 1 1 1 1 1 1 1 2 2 2 ...
If you look at the data close, you will notice that the feature you are working on is a factor having two classes: 1 = "<=50K" and 2 = ">50K". One fast way to extract the samples with class 2 of this feature is to convert it to integer and perform the operation on it:
> richadults = adults[as.integer(adults$V15) == 2, ]
> str(richadults)
'data.frame': 7841 obs. of 15 variables:
$ V1 : int 52 31 42 37 30 40 43 40 56 54 ...
$ V2 : Factor w/ 9 levels " ?"," Federal-gov",..: 7 5 5 5 8 5 7 5 3 1 ...
$ V3 : int 209642 45781 159449 280464 141297 121772 292175 193524 216851 180211 ...
$ V4 : Factor w/ 16 levels " 10th"," 11th",..: 12 13 10 16 10 9 13 11 10 16 ...
$ V5 : int 9 14 13 10 13 11 14 16 13 10 ...
$ V6 : Factor w/ 7 levels " Divorced"," Married-AF-spouse",..: 3 5 3 3 3 3 1 3 3 3 ...
$ V7 : Factor w/ 15 levels " ?"," Adm-clerical",..: 5 11 5 5 11 4 5 11 14 1 ...
$ V8 : Factor w/ 6 levels " Husband"," Not-in-family",..: 1 2 1 1 1 1 5 1 1 1 ...
$ V9 : Factor w/ 5 levels " Amer-Indian-Eskimo",..: 5 5 5 3 2 2 5 5 5 2 ...
$ V10: Factor w/ 2 levels " Female"," Male": 2 1 2 2 2 2 1 2 2 2 ...
$ V11: int 0 14084 5178 0 0 0 0 0 0 0 ...
$ V12: int 0 0 0 0 0 0 0 0 0 0 ...
$ V13: int 45 50 40 80 40 40 45 60 40 60 ...
$ V14: Factor w/ 42 levels " ?"," Cambodia",..: 40 40 40 40 20 1 40 40 40 36 ...
$ V15: Factor w/ 2 levels " <=50K"," >50K": 2 2 2 2 2 2 2 2 2 2 ...
In the new data frame (richadults) you will have 7 841 samples only with those individuals that have their income >50K. The original data set has 32 561 samples.

Related

Access frequencies of an atomic vector in a tibble data frame

I am doing Exploratory Data Analysis on a tibble data frame. I've never used tibble so I'm experiecing some difficulties.
My tibble data frame has this structure:
spec_tbl_df [7,397 x 19] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
$ X1 : num [1:7397] 9617 12179 9905 5745 10067 ...
$ Administrative : num [1:7397] 5 26 4 3 7 16 4 3 2 0 ...
$ Administrative_Duration: num [1:7397] 408 1562 58 103 165 ...
$ Informational : num [1:7397] 2 9 2 0 1 3 4 5 0 0 ...
$ Informational_Duration : num [1:7397] 47.5 503.7 28.5 0 28.5 ...
$ ProductRelated : num [1:7397] 54 183 82 25 115 86 75 23 27 33 ...
$ ProductRelated_Duration: num [1:7397] 1547 9676 4729 1109 3428 ...
$ BounceRates : num [1:7397] 0 0.0111 0 0 0 ...
$ ExitRates : num [1:7397] 0.01733 0.0142 0.01454 0.00167 0.01629 ...
$ PageValues : num [1:7397] 0 19.57 9.06 61.3 4.97 ...
$ SpecialDay : num [1:7397] 0 0 0 0 0 0 0 0 0 0 ...
$ Month : Factor w/ 10 levels "Aug","Dec","Feb",..: 8 8 8 1 8 4 8 7 8 8 ...
$ OperatingSystems : Factor w/ 8 levels "1","2","3","4",..: 2 3 2 2 2 3 3 4 8 2 ...
$ Browser : Factor w/ 13 levels "1","2","3","4",..: 2 2 2 2 2 2 2 1 2 5 ...
$ Region : Factor w/ 9 levels "1","2","3","4",..: 3 2 1 6 4 8 1 1 7 3 ...
$ TrafficType : Factor w/ 19 levels "1","2","3","4",..: 2 12 2 5 10 4 2 4 2 1 ...
$ VisitorType : Factor w/ 3 levels "New_Visitor",..: 3 3 3 1 3 3 3 3 1 3 ...
$ Weekend : Factor w/ 2 levels "FALSE","TRUE": 2 1 1 1 1 1 1 1 1 1 ...
$ Revenue : Factor w/ 2 levels "FALSE","TRUE": 2 2 2 2 2 2 2 2 2 2 ...
Now if I use plot_bar to plot the cathegorical data (using DataExplorer package) I have no problem. I would like, for example, to create a boxplot for the cathegorical variable "Month" where for each month I have a boxplot showing how values are distribuited. The problem is that I can't find a way to access the frequencies. If I do the following:
boxplot(Month)
It creates a single boxplot for all the data (all the months) but it's not helpfull at all. Like this:
I would like the months on the x axis and the frequencies on the y axis and a boxplot for each month.
I've tried to "extract" the feature month, transform it to a matrix and repeat the process but it does not work.
Here is the variable montht taken alone:
> summary(x_Month)
Aug Dec Feb Jul June Mar May Nov Oct Sep
258 1034 123 259 166 1125 2014 1814 327 277
What am I missing ?
Something like this would probably work to create barplots for the frequencies of Month:
library(ggplot2)
spec_tbl_df %>%
ggplot(aes(x = Month)) +
geom_bar()

How to combine training and testing dataset in same format

I am practicing with this dataset: http://archive.ics.uci.edu/ml/datasets/Census+Income
I loaded training & testing data.
# Downloading train and test data
trainFile = "adult.data"; testFile = "adult.test"
if (!file.exists (trainFile))
download.file (url = "http://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.data",
destfile = trainFile)
if (!file.exists (testFile))
download.file (url = "http://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.test",
destfile = testFile)
# Assigning column names
colNames = c ("age", "workclass", "fnlwgt", "education",
"educationnum", "maritalstatus", "occupation",
"relationship", "race", "sex", "capitalgain",
"capitalloss", "hoursperweek", "nativecountry",
"incomelevel")
# Reading training data
training = read.table (trainFile, header = FALSE, sep = ",",
strip.white = TRUE, col.names = colNames,
na.strings = "?", stringsAsFactors = TRUE)
# Load the testing data set
testing = read.table (testFile, header = FALSE, sep = ",",
strip.white = TRUE, col.names = colNames,
na.strings = "?", fill = TRUE, stringsAsFactors = TRUE)
I needed to combined two into one. But, there is a problem. I am seeing structure of the two data is not same.
Display structure of the training data
> str (training)
'data.frame': 32561 obs. of 15 variables:
$ age : int 39 50 38 53 28 37 49 52 31 42 ...
$ workclass : Factor w/ 8 levels "Federal-gov",..: 7 6 4 4 4 4 4 6 4 4 ...
$ fnlwgt : int 77516 83311 215646 234721 338409 284582 160187 209642 45781 159449 ...
$ education : Factor w/ 16 levels "10th","11th",..: 10 10 12 2 10 13 7 12 13 10 ...
$ educationnum : int 13 13 9 7 13 14 5 9 14 13 ...
$ maritalstatus: Factor w/ 7 levels "Divorced","Married-AF-spouse",..: 5 3 1 3 3 3 4 3 5 3 ...
$ occupation : Factor w/ 14 levels "Adm-clerical",..: 1 4 6 6 10 4 8 4 10 4 ...
$ relationship : Factor w/ 6 levels "Husband","Not-in-family",..: 2 1 2 1 6 6 2 1 2 1 ...
$ race : Factor w/ 5 levels "Amer-Indian-Eskimo",..: 5 5 5 3 3 5 3 5 5 5 ...
$ sex : Factor w/ 2 levels "Female","Male": 2 2 2 2 1 1 1 2 1 2 ...
$ capitalgain : int 2174 0 0 0 0 0 0 0 14084 5178 ...
$ capitalloss : int 0 0 0 0 0 0 0 0 0 0 ...
$ hoursperweek : int 40 13 40 40 40 40 16 45 50 40 ...
$ nativecountry: Factor w/ 41 levels "Cambodia","Canada",..: 39 39 39 39 5 39 23 39 39 39 ...
$ incomelevel : Factor w/ 2 levels "<=50K",">50K": 1 1 1 1 1 1 1 2 2 2 ...
Display structure of the testing data
> str (testing)
'data.frame': 16282 obs. of 15 variables:
$ age : Factor w/ 74 levels "|1x3 Cross validator",..: 1 10 23 13 29 3 19 14 48 9 ...
$ workclass : Factor w/ 9 levels "","Federal-gov",..: 1 5 5 3 5 NA 5 NA 7 5 ...
$ fnlwgt : int NA 226802 89814 336951 160323 103497 198693 227026 104626 369667 ...
$ education : Factor w/ 17 levels "","10th","11th",..: 1 3 13 9 17 17 2 13 16 17 ...
$ educationnum : int NA 7 9 12 10 10 6 9 15 10 ...
$ maritalstatus: Factor w/ 8 levels "","Divorced",..: 1 6 4 4 4 6 6 6 4 6 ...
$ occupation : Factor w/ 15 levels "","Adm-clerical",..: 1 8 6 12 8 NA 9 NA 11 9 ...
$ relationship : Factor w/ 7 levels "","Husband","Not-in-family",..: 1 5 2 2 2 5 3 6 2 6 ...
$ race : Factor w/ 6 levels "","Amer-Indian-Eskimo",..: 1 4 6 6 4 6 6 4 6 6 ...
$ sex : Factor w/ 3 levels "","Female","Male": 1 3 3 3 3 2 3 3 3 2 ...
$ capitalgain : int NA 0 0 0 7688 0 0 0 3103 0 ...
$ capitalloss : int NA 0 0 0 0 0 0 0 0 0 ...
$ hoursperweek : int NA 40 50 40 40 30 30 40 32 40 ...
$ nativecountry: Factor w/ 41 levels "","Cambodia",..: 1 39 39 39 39 39 39 39 39 39 ...
$ incomelevel : Factor w/ 3 levels "","<=50K.",">50K.": 1 2 2 3 3 2 2 2 3 2 ...
Problem 1:
age has become factor at testing. and all other levels of factor in testing is being increased by 1 than levels of factor in training. This is because first row is an unnecessary row in testing.
|1x3 Cross validator
I tried to get rid of this by re-assigning testing:
testing = testing[-1,]
but, after running str() command again, I don't see any change.
Problem 2:
Like I said at previous, I needed to combine those two data-frame into one data-frame. So, I run this:
combined <- rbind(training , testing)
Besides the problem-1, I can see new a problem after running str()
> str(combined)
'data.frame': 48842 obs. of 15 variables:
$ age : chr "39" "50" "38" "53" ...
$ workclass : Factor w/ 9 levels "Federal-gov",..: 7 6 4 4 4 4 4 6 4 4 ...
$ fnlwgt : int 77516 83311 215646 234721 338409 284582 160187 209642 45781 159449 ...
$ education : Factor w/ 17 levels "10th","11th",..: 10 10 12 2 10 13 7 12 13 10 ...
$ educationnum : int 13 13 9 7 13 14 5 9 14 13 ...
$ maritalstatus: Factor w/ 8 levels "Divorced","Married-AF-spouse",..: 5 3 1 3 3 3 4 3 5 3 ...
$ occupation : Factor w/ 15 levels "Adm-clerical",..: 1 4 6 6 10 4 8 4 10 4 ...
$ relationship : Factor w/ 7 levels "Husband","Not-in-family",..: 2 1 2 1 6 6 2 1 2 1 ...
$ race : Factor w/ 6 levels "Amer-Indian-Eskimo",..: 5 5 5 3 3 5 3 5 5 5 ...
$ sex : Factor w/ 3 levels "Female","Male",..: 2 2 2 2 1 1 1 2 1 2 ...
$ capitalgain : int 2174 0 0 0 0 0 0 0 14084 5178 ...
$ capitalloss : int 0 0 0 0 0 0 0 0 0 0 ...
$ hoursperweek : int 40 13 40 40 40 40 16 45 50 40 ...
$ nativecountry: Factor w/ 42 levels "Cambodia","Canada",..: 39 39 39 39 5 39 23 39 39 39 ...
$ incomelevel : Factor w/ 5 levels "<=50K",">50K",..: 1 1 1 1 1 1 1 2 2 2 ...
factor levels at target variable (incomelevel) in combined data-frame is 5 where it's 2 (which is correct) in the training data-frame and 3 (increased by 1 for problem-1) in testing data-frame. This is because there is a . (dot) after each value at incomelevel in testing data-frame (<=50K., <=50K., >50K.,......). So, I need to remove that .(dot) But, I am not getting idea how to remove it. Is there any function?
I am very in data and r. That's why, facing this type of basic issues. Can you please help me to solve the issue I am facing?
I think you can ignore the first line of test, this will solve the issue of age being a factor, because it seems like a header:
head(readLines(testFile))
[1] "|1x3 Cross validator"
[2] "25, Private, 226802, 11th, 7, Never-married, Machine-op-inspct, Own-child, Black, Male, 0, 0, 40, United-States, <=50K."
[3] "38, Private, 89814, HS-grad, 9, Married-civ-spouse, Farming-fishing, Husband, White, Male, 0, 0, 50, United-States, <=50K."
We run your code, we can use read.csv, with skip=1 for test:
colNames = c ("age", "workclass", "fnlwgt", "education",
"educationnum", "maritalstatus", "occupation",
"relationship", "race", "sex", "capitalgain",
"capitalloss", "hoursperweek", "nativecountry",
"incomelevel")
# Reading training data
training = read.csv (trainFile, header = FALSE, col.names = colNames,stringsAsFactors = TRUE,na.strings = "?",strip.white = TRUE)
testing = read.csv (testFile, header = FALSE, col.names = colNames,na.strings = "?",stringsAsFactors = TRUE,skip=1,strip.white = TRUE)
Now, the income level, unfortunately we have to correct it manually, it's a good thing you check:
testing$incomelevel = factor(gsub("\\.","",as.character(testing$incomelevel)))
We check levels, only difference is native country:
all.equal(sapply(testing,levels) ,sapply(training,levels))
[1] "Component “nativecountry”: Lengths (40, 41) differ (string compare on first 40)"
[2] "Component “nativecountry”: 26 string mismatches"
And I don't think there's much you can do, maybe you have to remove it before / after joining:
setdiff(levels(training$nativecountry),levels(testing$nativecountry))
[1] "Holand-Netherlands"

Extracting complete dataframe from Hmisc package in R

I've used aregImpute to impute the missing values then i used impute.transcan function trying to get complete dataset using the following code.
impute_arg <- aregImpute(~ age + job + marital + education + default +
balance + housing + loan + contact + day + month + duration + campaign +
pdays + previous + poutcome + y , data = mov.miss, n.impute = 10 , nk =0)
imputed <- impute.transcan(impute_arg, imputation=1, data=mov.miss, list.out=TRUE, pr=FALSE, check=FALSE)
y <- completed[names(imputed)]
and when i used str(y) it already gives me a dataframe but with NAs as it is not imputed before, My question is how to get complete dataset without NAs after imputation?
str(y)
'data.frame': 4521 obs. of 17 variables:
$ age : int 30 NA 35 30 NA 35 36 39 41 43 ...
$ job : Factor w/ 12 levels "admin.","blue-collar",..: 11 8 5 5 2 5 7 10 3 8 ...
$ marital : Factor w/ 3 levels "divorced","married",..: 2 2 3 2 2 3 2 2 2 2 ...
$ education: Factor w/ 4 levels "primary","secondary",..: 1 2 3 3 2 3 NA 2 3 1 ...
$ default : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 NA 1 1 1 ...
$ balance : int NA 4789 1350 1476 0 747 307 147 NA -88 ...
$ housing : Factor w/ 2 levels "no","yes": NA 2 2 2 NA 1 2 2 2 2 ...
$ loan : Factor w/ 2 levels "no","yes": 1 2 1 2 NA 1 1 NA 1 2 ...
$ contact : Factor w/ 3 levels "cellular","telephone",..: 1 1 1 3 3 1 1 1 NA 1 ...
$ day : int 19 NA 16 3 5 23 14 6 14 NA ...
$ month : Factor w/ 12 levels "apr","aug","dec",..: 11 9 1 7 9 4 NA 9 9 1 ...
$ duration : int 79 220 185 199 226 141 341 151 57 313 ...
$ campaign : int 1 1 1 4 1 2 1 2 2 NA ...
$ pdays : int -1 339 330 NA -1 176 330 -1 -1 NA ...
$ previous : int 0 4 NA 0 NA 3 2 0 0 2 ...
$ poutcome : Factor w/ 4 levels "failure","other",..: 4 1 1 4 4 1 2 4 4 1 ...
$ y : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
I have tested your code myself, and it works just fine, except for the last line:
y <- completed[names(imputed)]
I believe there's a type in the above line. Plus, you do not even need the completed function.
Besides, if you want to get a data.frame from the impute.transcan function, then wrap it with as.data.frame:
imputed <- as.data.frame(impute.transcan(impute_arg, imputation=1, data=mov.miss, list.out=TRUE, pr=FALSE, check=FALSE))
Moreover, if you need to test your missing data pattern, you can also use the md.pattern function provided by the mice package.

Error in code Neuralnet pacakge

I have to predict TragetBuy Variable which is coded as 0 and 1
I have the following code
library(neuralnet)
library(NeuralNetTools)
n <- names(train)
f <- as.formula(paste("TargetBuy ~", paste(n[!n %in% "TargetBuy"], collapse = " + ")))
parse_train <- model.matrix(~ ID + DemAffl + DemAge + DemCluster +
DemClusterGroup + DemGender + DemReg +
DemTVReg + PromClass + PromSpend + PromTime +
TargetBuy,
data = train)
head(parse_train)
nn <- neuralnet(f, data = parse_train,
hidden = 2,
err.fct = "ce",
threshold = 0.01,
linear.output = FALSE)
I am getting following error:
Error in eval(expr, envir, enclos) : object 'TargetBuy' not found
here I am providing str(train)
'data.frame': 15556 obs. of 12 variables:
$ ID : int 140 620 868 1120 2313 2771 3131 4529 5886 7420 ...
$ DemAffl : int 10 4 5 10 11 9 11 10 14 7 ...
$ DemAge : int 76 49 70 65 68 72 74 62 43 60 ...
$ DemCluster : int 16 35 27 51 4 28 3 49 49 52 ...
$ DemClusterGroup: Factor w/ 8 levels "","A","B","C",..: 4 5 5 7 2 5 2 7 7 7 ...
$ DemGender : Factor w/ 4 levels "","F","M","U": 4 4 2 3 2 4 2 3 2 2 ...
$ DemReg : Factor w/ 6 levels "","Midlands",..: 2 2 2 2 2 3 2 2 1 3 ...
$ DemTVReg : Factor w/ 14 levels "","Border","C Scotland",..: 13 13 13 6 6 9 4 4 1 7 ...
$ PromClass : Factor w/ 4 levels "Gold","Platinum",..: 1 1 3 4 4 2 4 3 1 1 ...
$ PromSpend : num 16000 6000 0.02 0.01 0.01 ...
$ PromTime : int 4 5 8 7 8 3 8 3 1 2 ...
$ TargetBuy : Factor w/ 2 levels "0","1": 1 1 2 2 1 1 1 1 2 1 ...

Change type of variables in multiple data frames

I have a list of data frames:
str(df.list)
List of 34
$ :'data.frame': 506 obs. of 7 variables:
..$ Protocol : Factor w/ 5 levels "P1","P2","P3",..: 1 1 1 1 1 1 1 1 1 1 ...
..$ Time : num [1:506] 0 2 3 0.5 6 1 24 24 24 24 ...
..$ SampleID : Factor w/ 40 levels "P1T0","P1T0.5",..: 1 5 7 2 8 3 6 6 6 6 ...
..$ VolunteerID: Factor w/ 15 levels "ID-02","ID-03",..: 10 10 10 10 10 10 10 11 13 14 ...
..$ Assay : Factor w/ 1 level "ALAT": 1 1 1 1 1 1 1 1 1 1 ...
..$ ResultAssay: int [1:506] 23 23 23 24 25 24 20 34 28 17 ...
..$ Index : Factor w/ 502 levels "P1T0.5VID-02",..: 8 31 37 2 43 19 25 26 28 29 ...
$ :'data.frame': 505 obs. of 7 variables:
..$ Protocol : Factor w/ 5 levels "P1","P2","P3",..: 1 1 1 1 1 1 1 1 1 1 ...
..$ Time : num [1:505] 0 2 3 0.5 6 1 24 24 24 24 ...
..$ SampleID : Factor w/ 40 levels "P1T0","P1T0.5",..: 1 5 7 2 8 3 6 6 6 6 ...
..$ VolunteerID: Factor w/ 15 levels "ID-02","ID-03",..: 10 10 10 10 10 10 10 11 13 14 ...
..$ Assay : Factor w/ 1 level "ALB": 1 1 1 1 1 1 1 1 1 1 ...
..$ ResultAssay: int [1:505] 45 46 47 47 49 47 46 46 44 43 ...
..$ Index : Factor w/ 501 levels "P1T0.5VID-02",..: 8 31 37 2 43 19 25 26 28 29 ..
The list contains 34 data frames with equal variable names. The variables Time and ResultAssay are of the wrong type: I would like to have Time as factor and ResultAssay as numerical.
I am trying to generate a function to use together with lapply to convert the variable type of this list of 34 data frames in one go, but so far i am unsuccessful.
I have tried things in parallel to:
ChangeType <- function(DF){
DF[,2] <- as.factor(DF[,2])
DF[, "ResultAssay"] <- as.numeric(DF[, c("ResultAssay")]
}
lapply(df.list, ChangeType)
What you have tried is nearly correct, but you also need to return the new data.frame and also store it to your existing variable, as so:
ChangeType <- function(DF){
DF[,2] <- as.factor(DF[,2])
DF[, "ResultAssay"] <- as.numeric(DF[, c("ResultAssay")]
DF #return the data.frame
}
# store the returned value to df.list,
# thus updating your existing data.frame
df.list <- lapply(df.list, ChangeType)

Resources