2 lines of headers in R from csv - r

I have a lot of csv files with double headers as below. (This is only part of it, and both headers contain important info) How could I combine the first two rows of the csv file to obtain a single line of header? (e.g.Life.expectancy.at.birth..years..1Female)
Life.expectancy.at.birth..years..1 Life.expectancy.at.birth..years..2
1 Female Male
2 62 61
3 61 58
4 56 54
5 50 49
6 76 73

Read it twice and paste the headers together. For the second read limit the number of rows read since we really only need the header.
# in next 2 lines replace text=Lines with something like "myfile"
DF <- read.table(text = Lines, header = TRUE, skip = 1)
hdr1 <- read.table(text = Lines, header = TRUE, nrows = 1)
names(DF) <- paste0(names(hdr1), names(DF))
giving:
> DF
Life.expectancy.at.birth..years..1Female Life.expectancy.at.birth..years..2Male
1 62 61
2 61 58
3 56 54
4 50 49
5 76 73
Note: We used this for the input Lines:
Lines <- " Life.expectancy.at.birth..years..1 Life.expectancy.at.birth..years..2
Female Male
62 61
61 58
56 54
50 49
76 73"

Related

Splitting a matrix into multiple matrices

There are two matrices:
Matrix with 2 columns: node name and node degree (k1):
Matrix with 1 column: degrees (ms):
I need to split 1st matrix into multiple matrices, where every matrix has nodes of same degree. Then, write matrices to csv-files. But my code is not working. How can i do this correctly?
k1<-read.csv2("VandD.csv", header = FALSE)
fnk1<-as.matrix(k1)
ms<-read.csv2("mas.csv", header = FALSE)
massive<-as.matrix(ms)
wlk<-1
varbl<-1
rtt<-list()
for (wlk in 1:384) {
rtt<-NULL
stepen<-massive[wlk]
for (varbl in 1:2154) {
if(fnk1[varbl,2]==stepen){
kapa<-fnk1[varbl,1]
rtt<-append(rtt,kapa)
}
}
namef<-paste("reslt",stepen,".csv",sep = "")
write.csv2(rtt, file=namef)
}
k1
V1 V2
1 UC7Ucs42FZy3uYzjrqzOIHsw 81
2 UCyWDmyZRjrGHeKF-ofFsT5Q 81
3 UCIZP6nCTyU9VV0zIhY7q1Aw 81
4 UCqk3CdGN_j8IR9z4uBbVPSg 81
5 UCjWzQkWu0l1yAhcBoavokng 81
6 UCRXiA3h1no_PFkb1JCP0yMA 81
7 UC2w9SdXpwq2Uq-MV4W4A8kw 81
8 UCdJqTQJZleoxZFReiyNvn8w 81
9 UC2Qw1dzXDBAZPwS7zm37g8g 81
10 UCTOovOHTf4efJOmGvJBxIQQ 81
ms
V1
1 81
2 82
3 83
4 84
5 85
6 86
7 87
8 88
9 89
10 90
Seems you need split
split(k1,k1$v2)
We can use group_split
library(dplyr)
k1 %>%
group_split(v2)

Select a range of rows from every n rows from a data frame

I have 2880 observations in my data.frame. I have to create a new data.frame in which, I have to select rows from 25-77 from every 96 selected rows.
df.new = df[seq(25, nrow(df), 77), ] # extract from 25 to 77
The above code extracts only row number 25 to 77 but I want every row from 25 to 77 in every 96 rows.
One option is to create a vector of indeces with which subset the dataframe.
idx <- rep(25:77, times = nrow(df)/96) + 96*rep(0:29, each = 77-25+1)
df[idx, ]
You can use recycling technique to extract these rows :
from = 25
to = 77
n = 96
df.new <- df[rep(c(FALSE, TRUE, FALSE), c(from - 1, to - from + 1, n - to))), ]
To explain for this example it will work as :
length(rep(c(FALSE, TRUE, FALSE), c(24, 53, 19))) #returns
#[1] 96
In these 96 values, value 25-77 are TRUE and rest of them are FALSE which we can verify by :
which(rep(c(FALSE, TRUE, FALSE), c(24, 53, 19)))
# [1] 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46
#[23] 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68
#[45] 69 70 71 72 73 74 75 76 77
Now this vector is recycled for all the remaining rows in the dataframe.
First, define a Group variable, with values 1 to 30, each value repeating 96 times. Then define RowWithinGroup and filter as required. Finally, undo the changes introduced to do the filtering.
df <- tibble(X=rnorm(2880)) %>%
add_column(Group=rep(1:96, each=30)) %>%
group_by(Group) %>%
mutate(RowWithinGroup=row_number()) %>%
filter(RowWithinGroup >= 25 & RowWithinGroup <= 77) %>%
select(-Group, -RowWithinGroup) %>%
ungroup()
Welcome to SO. This question may not have been asked in this exact form before, but the proinciples required have been rerefenced in many, many questions and answers,
A one-liner base solution.
lapply(split(df, cut(1:nrow(df), nrow(df)/96, F)), `[`, 25:77, )
Note: Nothing after the last comma
The code above returns a list. To combine all data together, just pass the result above into
do.call(rbind, ...)

R - apriori() not recognising lhs from numerical transaction

I am having real trouble getting my data to produce any rules using the arules package. I have managed to get 100000 rows of transaction data and in SAS the rules are shown. I cannot get it to work in R.
[5] {19,29,40,119,134}
[6] {24,40,45,67,141}
[7] {17,18,57,74,412}
[8] {16,79,90,150,498}
[9] {18,57,111,161,267}
[10] {11,75,131,427,429}
[11] {57,99,111,143,236}
The transactions data looks like this and originally came from a table where all the numbers were separate.
arules <- read.transactions('tid.csv', format = c("basket", "single"),
sep=",")
rules <- apriori(arules,parameter = list(supp = 0.1, conf = 0.1, target =
"rules"))
summary(rules)
For reference the supports and confidence settings make no difference. Sometimes I get this when I inspect the rules.
lhs rhs support confidence lift count
[1] {} => {8,11,96,112,432} 9.710623e-06 9.710623e-06 1 1
[2] {} => {62,134,222,254,412} 9.710623e-06 9.710623e-06 1 1
Any idea why apriori can't separate the items in the transaction? Does this need to be recast into long format and if so how would I do that form this data frame?
V2 V3 V4 V5 V6
8 11 96 112 432
10 35 39 76 119
18 38 68 141 267
29 36 57 61 63
19 29 40 119 134
24 40 45 67 141
17 18 57 74 412
If I understood you correctly then you should try this and let us know if it helped.
library(arules)
library(arulesViz)
#sample data
df <- read.table(text="V2 V3 V4 V5 V6
8 11 96 112 432
10 35 39 76 119
18 38 68 141 267
29 36 57 61 63
19 29 40 119 134
24 40 45 67 141
17 18 57 74 412", header=T)
write.csv(df, "apriori_demo.csv", row.names = F)
#convert sample data into transactions format for apriori algorithm
trx <- read.transactions("apriori_demo.csv", format="basket", sep=",", skip=1)
#apriori rules
apriori_rule <- apriori(trx, parameter = list(supp = 0.1, conf = 0.1))
#obviously you need to have better parameters compared to the one you have used in your post!
inspect(apriori_rule)
plot(apriori_rule, method="graph")

assign objects to dynamic lists in r

I have a nested loops which produce outputs that I want to store in list objects with dynamic names. A toy example of this would look as follows:
set.seed(8020)
names<-sample(LETTERS,5,replace = F)
for(n in names)
{
#Create the list
assign(paste0("examples_",n),list())
#Poulate the list
get(paste0("examples_",n))[[1]]<-sample(100,10)
get(paste0("examples_",n))[[2]]<-sample(100,10)
get(paste0("examples_",n))[[3]]<-sample(100,10)
}
Unfortunately I keep getting the error:
Error in get(paste0("examples_", n))[[1]] <- sample(100, 10) :
target of assignment expands to non-language object
I have tried all kind of assign, eval, get type of functions to parse the object, but haven't had any luck
Expanding on my comment with a worked example:
examples <- vector(mode="list", length=length(names) )
names(examples) <- names # please change that to mynames
# or almost anything other than `names`
examples <- lapply( examples, function(L) {L[[1]] <- sample(100,10)
L[[2]] <- sample(100,10)
L[[3]] <- sample(100,10); L} )
# Top of the output:
> examples
$P
$P[[1]]
[1] 34 49 6 55 19 28 72 42 14 92
$P[[2]]
[1] 97 71 63 59 66 50 27 45 76 58
$P[[3]]
[1] 94 39 77 44 73 15 51 78 97 53
$F
$F[[1]]
[1] 12 21 89 26 16 93 4 13 62 45
$F[[2]]
[1] 83 21 68 74 32 86 52 49 16 13
$F[[3]]
[1] 14 45 40 46 64 85 88 28 53 42
This mode of programming does become more natural over time. It gets you out of writing clunky for-loops all the time. Develop your algorithms for a single list-node at a time and then use sapply or lapply to iterate the processing.

How to remove special character from data frame

I have imported data from a url and converted it to a data frame using the following code:
url <-"http://apims.doe.gov.my/v2/hourly2.php"
tables<- readHTMLTable(url)
try<-do.call(rbind, lapply(tables, data.frame, stringsAsFactors=FALSE))
The data has '*' next to the numbers. I would like to isolate the numbers only.
So instead of
52* 45* 67* 55*
I have
52 45 67 55
I have tried several methods to get the * special character out of 3rd through 8th columns and change the column to a numeric but since this character also has a meaning in R these are not working. I have tried:
x <- "~!##$%^&*"
str_replace_all(x, as.character(try[,3:8]), " ")
I have also tried:
gsub("*","",try[,3:8])
The only function that has identified the * characters correctly is grep and grapl but I need another function that will use the grep output to remove the '*' special character.
grep('*',try)
Try this:
dat<-do.call(rbind, lapply(tables, data.frame, stringsAsFactors=FALSE))
dat[, -(1:2)] <- sapply(dat[, -(1:2)], function(col) {
as.numeric(sub("[*]$", "", col))
})
head(dat)
# NEGERI...STATE KAWASAN.AREA MASA.TIME06.00AM MASA.TIME07.00AM MASA.TIME08.00AM MASA.TIME09.00AM MASA.TIME10.00AM MASA.TIME11.00AM
# NULL.1 Johor Kota Tinggi 52 53 52 50 50 49
# NULL.2 Johor Larkin Lama 51 51 51 NA 51 51
# NULL.3 Johor Muar 45 45 45 45 45 45
# NULL.4 Johor Pasir Gudang 56 56 55 56 56 56
# NULL.5 Kedah Alor Setar 50 50 50 50 50 49
# NULL.6 Kedah Bakar Arang, Sg. Petani NA NA NA NA NA NA

Resources