I have a part of the data-set as shown below in the form of csv,the number of rows and columns are more than what is shown.I want to implement apriori on this data-set,Say I have this:-
Maths Science C++ Java DC
[1] 75 44 55 56 88
[2] 56 88 54 78 44
the original dataset has total columns(representing subjects)=30 and serial number(representing students)=24,
DATASET:link
I want to covert this dataset in the form shown below:-
[1] {Maths,DC}
[2] {Science,Java}
i.e A list of list(I think this is what it is called) containing the colnames.A list for a student shows in which subject he/she scored more than or equal to 75 marks,rest of the subjects are dropped(The only condition of the problem)
eq:- first student scored 75+ marks in Dc and Maths and so his list includes only dc and maths.
I am sorry for posting this,but I searched a lot on stack,and found a few of the working suggestions ,but couldn't reach the final goal.
My goal is to get a form like this:-
[9834] {semi-finished bread,
bottled water,
soda,
bottled beer}
[9835] {chicken,
tropical fruit,
other vegetables,
vinegar,
shopping bags}
As given in :-
library(arules)
inspect(Groceries)
OR I WILL APPRECIATE IF ANYONE CAN SUGGEST A WAY TO REPRESENT THE DATA IN OTHER FORM WHICH APRIORI CAN UNDERSTAND,BUT IT SHOULD FOLLOW THE NECESSARY CONDITIONS AS STATED.
*(sorry for the long post,I hope this conversion of my dataset in this format may help me study the pattern in student-subject dataset,thnx a ton for all the help)
library(plyr)
library(arules)
df <- read.table(text =
" 75 44 55 56 88
56 88 54 78 44")
names(df) <- c("Maths", "Science", "C++", "Java", "DC")
transactions <- as(alply(df, 1, function(x) names(x)[x >= 75]), "transactions")
inspect(transactions)
# items transactionID
# [1] {DC,Maths} 1
# [2] {Java,Science} 2
Edit: It works with your example dataset, too:
library(plyr)
library(arules)
df <- read.csv(file = url("https://drive.google.com/uc?export=download&id=0B3kdblyHw4qLR0dpT24xWUZGcGs"))
transactions <- as(alply(df, 1, function(x) names(x)[x >= 75]), "transactions")
inspect(transactions)
# items transactionID
# [1] {CD,CG,CN,DA,Data.Struc} 1
# [2] {CD,CG,CO,ML,OS} 2
# [3] {CN,Data.Struc,DC,DM,DMS} 3
# [4] {CHE,DD,DM,EC,EE} 4
# [5] {CHE,CN,MATHS,PHY} 5
# [6] {Data.Science,DM,DMS,ML,OS} 6
# [7] {CD,DA,Data.Struc,EC,MATHS} 7
# [8] {CG,CHE,CN,CO,OS} 8
# [9] {CN,CO,Data.Science,DC,DMS} 9
# [10] {DC,DD,EC,EE,PHY} 10
# [11] {CHE,DD,DMS,MATHS,PHY} 11
# [12] {CN,Data.Science,DM,MATHS,ML} 12
# [13] {CD,CG,DA,Data.Science,Data.Struc} 13
# [14] {CG,CO,EE,MATHS,OS} 14
# [15] {CN,CO,DC,DMS,PHY} 15
# [16] {CN,CO,DD,EC,EE} 16
# [17] {CHE,DA,EE,MATHS,PHY} 17
# [18] {Data.Science,DD,DM,ML,PHY} 18
# [19] {CD,CO,DA,Data.Struc,DC} 19
# [20] {CG,CO,DD,DM,OS} 20
# [21] {CG,CN,DA,DC,DMS} 21
# [22] {DD,EC,EE,ML,OS} 22
# [23] {CHE,CN,Data.Struc,MATHS,PHY} 23
# [24] {CG,Data.Science,DM,EE,ML} 24
I am having trouble displaying Russian characters in the Rstudio console. I load an Excel file with Russian using the readxl package. The cyrillic displays properly in the dataframe. However, if I run a function that has an output that includes the variable names, the RStudio consoles displays symbols instead of the proper Cyrillic characters.
test.xlsx contains two columns - зависимая переменная (dependent variable - numeric) and независимая переменная (independent variable, factor).
зависимая_переменная независимая_переменная
5 а
6 б
8 в
8 а
7.5 б
6 в
5 а
4 б
3 в
2 а
5 б
My code:
Sys.setlocale(locale = "Russian")
install.packages("readxl")
require(readxl)
basetable <- readxl::read_excel('test.xlsx',sheet = 1)
View(basetable)
basetable$независимая_переменная <- as.factor(basetable$независимая_переменная)
str(basetable)
This is what I get for the output of the str function:
Classes ‘tbl_df’, ‘tbl’ and 'data.frame': 11 obs. of 2 variables:
$ çàâèñèìàÿ_ïåðåìåííàÿ : num 5 6 8 8 7.5 6 5 4 3 2 ...
$ íåçàâèñèìàÿ_ïåðåìåííàÿ: Factor w/ 3 levels "а","б","в": 1 2 3 1 2 3 1 2 3 1 ...
I want to have the variable names displayed properly in Russian because I will be building many models from this data. For reference, here is my sessionInfo()
R version 3.2.3 (2015-12-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows >= 8 x64 (build 9200)
locale:
[1] LC_COLLATE=Russian_Russia.1251 LC_CTYPE=Russian_Russia.1251
[3] LC_MONETARY=Russian_Russia.1251 LC_NUMERIC=C
[5] LC_TIME=Russian_Russia.1251
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] readxl_0.1.1 shiny_0.13.1 dplyr_0.4.3
loaded via a namespace (and not attached):
[1] Rcpp_0.12.2 digest_0.6.9 assertthat_0.1 mime_0.4
[5] chron_2.3-47 R6_2.1.2 xtable_1.8-2 jsonlite_0.9.19
[9] DBI_0.3.1 magrittr_1.5 lazyeval_0.1.10 data.table_1.9.6
[13] tools_3.2.3 httpuv_1.3.3 parallel_3.2.3 htmltools_0.3
Try to change dataframe colnames encoding to UTF-8.
Encoding(colnames(YOURDATAFRAME)) <- "UTF-8"
I have a data.frame, all columns are numeric. I want to convert one integer column to factor, but doing so will convert all other columns to class character. Is there anyway to just convert one column to factor?
The example is from Converting variables to factors in R:
myData <- data.frame(A=rep(1:2, 3), B=rep(1:3, 2), Pulse=20:25)
myData$A <-as.factor(myData$A)
The result
apply(myData,2,class)
# A B Pulse
# "character" "character" "character"
sessionInfo()
R version 3.1.2 (2014-10-31)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] splines stats graphics grDevices utils datasets methods base ...
str(myData$A)
# Factor w/ 2 levels "1","2": 1 2 1 2 1 2
Your code actually works when I test it.
This is my output from str(myData):
'data.frame': 6 obs. of 3 variables:
$ A : Factor w/ 2 levels "1","2": 1 2 1 2 1 2
$ B : int 1 2 3 1 2 3
$ Pulse: int 20 21 22 23 24 25
Your issue is because, as ?apply states:
‘apply’ attempts to coerce
to an array via ‘as.matrix’ if it is two-dimensional (e.g., a data
frame)
This is done before executing the function on each column. And when you run as.matrix(myData) you end up with everything forced to one class, in this case character data:
is.character(as.matrix(myData))
#[1] TRUE
I am working intensively with the amazing ff and ffbase package.
Due to some technical details, I have to work in my C: drive with my R session. After finishing that, I move the generated files to my P: drive (using cut/paste in windows, NOT using ff).
The problem is that when I load the ffdf object:
load.ffdf("data")
I get the error:
Error: file.access(filename, 0) == 0 is not TRUE
This is ok, because nobody told the ffdf object that it was moved, but trying :
filename(data$x) <- "path/data_ff/x.ff"
or
pattern(data) <- "./data_ff/"
does not help, giving the error:
Error in `filename<-.ff`(`*tmp*`, value = filename) :
ff file rename from 'C:/DATA/data_ff/id.ff' to 'P:/DATA_C/data_ff/e84282d4fb8.ff' failed.
Is there any way to "change" into the ffdf object the path for the files new location?
Thank you !!
If you want to 'correct' your filenames afterwards you can use:
physical(x)$filename <- "newfilename"
For example:
> a <- ff(1:20, vmode="integer", filename="./a.ff")
> saveRDS(a, "a.RDS")
> rm(a)
> file.rename("./a.ff", "./b.ff")
[1] TRUE
> b <- readRDS("a.RDS")
> b
ff (deleted) integer length=20 (20)
> physical(b)$filename <- "./b.ff"
> b[]
opening ff ./b.ff
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Using filename() in the first session would of course have been easier. You could also have a look at the save.ffdf and corresponding load.ffdf functions in the ffbase package, which make this even simpler.
Addition
To rename the filenames of all columns in a ffdf you can use the following function:
redir <- function(ff, newdir) {
for (x in physical(b)) {
fn <- basename(filename(x))
physical(x)$filename <- file.path(newdir, fn)
}
return (ff)
}
You can also use ff:::clone()
R> foo <- ff(1:20, vmode = "integer")
R> foo
ff (open) integer length=20 (20)
[1] [2] [3] [4] [5] [6] [7] [8] [13] [14] [15] [16] [17] [18] [19]
1 2 3 4 5 6 7 8 : 13 14 15 16 17 18 19
[20]
20
R> physical(foo)$filename
[1] "/vol/fftmp/ff69be3e90e728.ff"
R> bar <- clone(foo, pattern = "~/")
R> bar
ff (open) integer length=20 (20)
[1] [2] [3] [4] [5] [6] [7] [8] [13] [14] [15] [16] [17] [18] [19]
1 2 3 4 5 6 7 8 : 13 14 15 16 17 18 19
[20]
20
R> physical(bar)$filename
[1] "/home/ubuntu/69be5ec0cf98.ff"
From what I understand from briefly skimming the code of save.ffdf and load.ffdf, those functions do this for you when you save/load.
I've a data.frame with row.names as in test.
test <-
c("Env_1990:trait_KPS", "Env_1990:trait_SPSM", "Env_1990:trait_TKW",
"Env_1990:trait_Yield", "Env_1991:trait_KPS", "Env_1991:trait_SPSM",
"Env_1991:trait_TKW", "Env_1991:trait_Yield", "Env_1992:trait_KPS",
"Env_1992:trait_SPSM", "Env_1992:trait_TKW", "Env_1992:trait_Yield",
"Env_1993:trait_KPS", "Env_1993:trait_SPSM", "Env_1993:trait_TKW",
"Env_1993:trait_Yield", "Env_1994:trait_KPS", "Env_1994:trait_SPSM",
"Env_1994:trait_TKW", "Env_1994:trait_Yield", "Env_1995:trait_KPS",
"Env_1995:trait_SPSM", "Env_1995:trait_TKW", "Env_1995:trait_Yield",
"Gen_B88:Env_1990:trait_KPS", "Gen_B88:Env_1990:trait_SPSM",
"Gen_B88:Env_1990:trait_TKW", "Gen_B88:Env_1990:trait_Yield",
"Gen_B88:Env_1991:trait_KPS", "Gen_B88:Env_1991:trait_SPSM",
"Gen_B88:Env_1991:trait_TKW", "Gen_B88:Env_1991:trait_Yield",
"Gen_B88:Env_1992:trait_KPS", "Gen_B88:Env_1992:trait_SPSM",
"Gen_B88:Env_1992:trait_TKW", "Gen_B88:Env_1992:trait_Yield",
"Gen_B88:Env_1993:trait_KPS", "Gen_B88:Env_1993:trait_SPSM",
"Gen_B88:Env_1993:trait_TKW", "Gen_B88:Env_1993:trait_Yield")
I want to select only those rows which start with Env_. I tried this code in R
grep(pattern="[Env_]", x=test).
This code gives me all rows because Env_ appears in every row name. I wonder how to select rows which starts only with Env_. Thanks in advance for your help.
You want to add the ^ character for beginning of line/string:
> grep("^Env_", test)
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
> grep("^Env_", test, value = TRUE)
[1] "Env_1990:trait_KPS" "Env_1990:trait_SPSM" "Env_1990:trait_TKW"
[4] "Env_1990:trait_Yield" "Env_1991:trait_KPS" "Env_1991:trait_SPSM"
[7] "Env_1991:trait_TKW" "Env_1991:trait_Yield" "Env_1992:trait_KPS"
[10] "Env_1992:trait_SPSM" "Env_1992:trait_TKW" "Env_1992:trait_Yield"
[13] "Env_1993:trait_KPS" "Env_1993:trait_SPSM" "Env_1993:trait_TKW"
[16] "Env_1993:trait_Yield" "Env_1994:trait_KPS" "Env_1994:trait_SPSM"
[19] "Env_1994:trait_TKW" "Env_1994:trait_Yield" "Env_1995:trait_KPS"
[22] "Env_1995:trait_SPSM" "Env_1995:trait_TKW" "Env_1995:trait_Yield"