R code is not creating objects? - r

I have written some code for a university assignment. The assignment is based on various concrete samples and their tensile strengths. There are 20 types of concrete mixtures (made from four different accelerators, and five different plasticisers). Our job is to do a statistical analysis on this data frame:
TStrength accelerator plasticiser
1 3.417543 1 1
2 2.887113 1 2
3 3.600988 1 3
4 3.702631 1 4
5 3.686944 1 5
6 3.699785 1 1
7 3.112972 1 2
8 3.918160 1 3
9 3.600538 1 4
10 2.748832 1 5
11 3.404498 1 1
12 3.735437 1 2
13 3.347577 1 3
14 3.101556 1 4
15 3.527621 1 5
16 3.856831 1 1
17 3.492118 1 2
18 3.928343 1 3
19 3.511689 1 4
20 3.371985 1 5
21 3.069794 2 1
22 3.168010 2 2
23 3.316657 2 3
24 3.455162 2 4
25 2.818250 2 5
26 4.054507 2 1
27 3.065984 2 2
28 3.201351 2 3
29 3.417554 2 4
30 3.364320 2 5
31 3.218677 2 1
32 2.647151 2 2
33 3.222705 2 3
34 3.145210 2 4
35 3.636642 2 5
36 3.317620 2 1
37 3.645922 2 2
38 2.556071 2 3
39 3.177663 2 4
40 3.014374 2 5
41 3.838183 3 1
42 4.155951 3 2
43 3.886330 3 3
44 3.723898 3 4
45 4.425442 3 5
46 3.738460 3 1
47 3.217834 3 2
48 3.942241 3 3
49 3.699851 3 4
50 3.797089 3 5
51 3.652456 3 1
52 4.851609 3 2
53 3.359099 3 3
54 4.089559 3 4
55 4.282991 3 5
56 3.803784 3 1
57 3.519551 3 2
58 3.935084 3 3
59 3.890324 3 4
60 4.611936 3 5
61 3.343098 4 1
62 3.713952 4 2
63 3.629883 4 3
64 3.082509 4 4
65 3.346548 4 5
66 3.277845 4 1
67 3.509506 4 2
68 3.490567 4 3
69 3.235009 4 4
70 3.970925 4 5
71 3.504646 4 1
72 3.270798 4 2
73 3.547298 4 3
74 3.278489 4 4
75 3.322743 4 5
76 2.975010 4 1
77 3.384996 4 2
78 3.399486 4 3
79 3.703567 4 4
80 3.214973 4 5
My first step was to attempt to find out the means of the Tstrength values for each of the 20 concrete types (there are four types of each unique concrete sample). I am very new to R, and my code is certainly not beautiful, but this is the code I wrote to find the means:
#Setting the correct directory
setwd("C:/Users/Matthew/Desktop/Work/Engineering")
#Creating the data frame object, Concrete.
#Note that this will only work if the file
#s...-CW.dat is in the current working directory
#Therefore for this code to work, CreateData.r must
#be run on the individual computer with the
#given matriculation number, and the file must be saved
#in the specified directory
Concrete<-read.table(file='s...-CW.dat',header=TRUE)
#Since the samples of concrete are made from 4 different accelerators and
#5 different plasticisers there will be 4*5=20 unique combinations from
#which concrete samples can come from (i.e. 1,1; 1,2; 4,5 etc).
# There are four samples of each combination
#The next section of code is used to find the mean of the four samples,
#for each combination (20 total)
#creating a list with Tstrength from all (1,1) combinations
#Then finding average
combo1 = list(Concrete[1,1],Concrete[6,1],Concrete[11,1],Concrete[16,1])
combo1mean = mean(unlist(combo1))
#Repeating for (1,2)
combo2 = list(Concrete[2,1],Concrete[7,1],Concrete[12,1],Concrete[17,1])
combo2mean = mean(unlist(combo2))
#Repeating for (1,3)
combo3 = list(Concrete[3,1],Concrete[8,1],Concrete[13,1],Concrete[18,1])
combo3mean = mean(unlist(combo3))
#Repeating for (1,4)
combo4 = list(Concrete[4,1],Concrete[9,1],Concrete[14,1],Concrete[19,1])
combo4mean = mean(unlist(combo4))
#Repeating for (1,5)
combo5 = list(Concrete[5,1],Concrete[10,1],Concrete[15,1],Concrete[20,1])
combo5mean = mean(unlist(combo5))
#Repeating for (2,1)
combo6 = list(Concrete[21,1],Concrete[26,1],Concrete[31,1],Concrete[36,1])
combo6mean = mean(unlist(combo6))
#Repeating for (2,2)
combo7 = list(Concrete[22,1],Concrete[27,1],Concrete[32,1],Concrete[37,1])
combo7mean = mean(unlist(combo7))
#Repeating for (2,3)
combo8 = list(Concrete[23,1],Concrete[28,1],Concrete[33,1],Concrete[38,1])
combo8mean = mean(unlist(combo8))
#Repeating for (2,4)
combo9 = list(Concrete[24,1],Concrete[29,1],Concrete[34,1],Concrete[39,1])
combo9mean = mean(unlist(combo9))
#Repeating for (2,5)
combo10 = list(Concrete[25,1],Concrete[30,1],Concrete[35,1],Concrete[40,1])
combo10mean = mean(unlist(combo10))
#Repeating for (3,1)
combo11 = list(Concrete[41,1],Concrete[46,1],Concrete[51,1],Concrete[56,1])
combo11mean = mean(unlist(combo11))
#Repeating for (3,2)
combo12 = list(Concrete[42,1],Concrete[47,1],Concrete[52,1],Concrete[57,1])
combo12mean = mean(unlist(combo12))
#Repeating for (3,3)
combo13 = list(Concrete[43,1],Concrete[48,1],Concrete[53,1],Concrete[58,1])
combo13mean = mean(unlist(combo13))
#Repeating for (3,4)
combo14 = list(Concrete[44,1],Concrete[49,1],Concrete[54,1],Concrete[59,1])
combo14mean = mean(unlist(combo14))
#Repeating for (3,5)
combo15 = list(Concrete[45,1],Concrete[50,1],Concrete[55,1],Concrete[60,1])
combo15mean = mean(unlist(combo15))
#Repeating for (4,1)
combo16 = list(Concrete[61,1],Concrete[66,1],Concrete[71,1],Concrete[76,1])
combo16mean = mean(unlist(combo16))
#Repeating for (4,2)
combo17 = list(Concrete[62,1],Concrete[67,1],Concrete[72,1],Concrete[77,1])
combo17mean = mean(unlist(combo17))
#Repeating for (4,3)
combo18 = list(Concrete[63,1],Concrete[68,1],Concrete[73,1],Concrete[78,1])
combo18mean = mean(unlist(combo18))
#Repeating for (4,4)
combo19 = list(Concrete[64,1],Concrete[69,1],Concrete[74,1],Concrete[79,1])
combo19mean = mean(unlist(combo19))
#Repeating for (4,5)
combo20 = list(Concrete[65,1],Concrete[70,1],Concrete[75,1],Concrete[80,1])
combo20mean = mean(unlist(combo20))
A few notes about the code: "s..." is just my matriculation number. I have triple checked that I have not made a mistake here regarding either the file name or the directory with where it is stored. CreataData.r is just a script provided to us the generates the data used to create 'Concrete' based on our matriculation number (so we're not just blindly copying each other I suppose).
The problem I am having with the code is that whenever it runs, the object Concrete is created, as is combo1mean, combo2mean and combo3mean. However, I just cannot figure out why the rest of the objects aren't being created.
I have had no success using running the script in the Rgui. After running the script, it tells I check that Concrete has initialised, and I check to see if the combo4mean and above have initialised too, but they never do. I thought it maybe had to do with running the wrong file, or that I hadn't saved the data properly, but the script definitely contains all the code, and I created a new file to see if that would work, but unfortunately it didn't. Also, I have read an introduction to R by W.N. Venables, D.M. Smith and the R Core Team, but nothing there has helped me figure this out.
PS I am not doing this as an easy way out of homework. I have genuinely tried to figure out what is going wrong but I cannot seem to find the problem. I also apologise if the question is inaccurate in anyway, or if I have had misunderstandings, I am very new to R and am trying my best to learn it! Cheers in advance.
EDIT: Just in case anyone is curious, I managed to get the exact same code to work on a different computer, starting from an empty workspace. I'm still not very sure why it didn't work on the first computer, but thanks 42 for the code suggestions.

Adding code that should bypass issues related to reading a text file. This shouls succeed on any R installation:
Concrete <- read.table(text="TStrength accelerator plasticiser
1 3.417543 1 1
2 2.887113 1 2
3 3.600988 1 3
4 3.702631 1 4
5 3.686944 1 5
6 3.699785 1 1
7 3.112972 1 2
8 3.918160 1 3
9 3.600538 1 4
10 2.748832 1 5
11 3.404498 1 1
12 3.735437 1 2
13 3.347577 1 3
14 3.101556 1 4
15 3.527621 1 5
16 3.856831 1 1
17 3.492118 1 2
18 3.928343 1 3
19 3.511689 1 4
20 3.371985 1 5
21 3.069794 2 1
22 3.168010 2 2
23 3.316657 2 3
24 3.455162 2 4
25 2.818250 2 5
26 4.054507 2 1
27 3.065984 2 2
28 3.201351 2 3
29 3.417554 2 4
30 3.364320 2 5
31 3.218677 2 1
32 2.647151 2 2
33 3.222705 2 3
34 3.145210 2 4
35 3.636642 2 5
36 3.317620 2 1
37 3.645922 2 2
38 2.556071 2 3
39 3.177663 2 4
40 3.014374 2 5
41 3.838183 3 1
42 4.155951 3 2
43 3.886330 3 3
44 3.723898 3 4
45 4.425442 3 5
46 3.738460 3 1
47 3.217834 3 2
48 3.942241 3 3
49 3.699851 3 4
50 3.797089 3 5
51 3.652456 3 1
52 4.851609 3 2
53 3.359099 3 3
54 4.089559 3 4
55 4.282991 3 5
56 3.803784 3 1
57 3.519551 3 2
58 3.935084 3 3
59 3.890324 3 4
60 4.611936 3 5
61 3.343098 4 1
62 3.713952 4 2
63 3.629883 4 3
64 3.082509 4 4
65 3.346548 4 5
66 3.277845 4 1
67 3.509506 4 2
68 3.490567 4 3
69 3.235009 4 4
70 3.970925 4 5
71 3.504646 4 1
72 3.270798 4 2
73 3.547298 4 3
74 3.278489 4 4
75 3.322743 4 5
76 2.975010 4 1
77 3.384996 4 2
78 3.399486 4 3
79 3.703567 4 4
80 3.214973 4 5", header=TRUE)
This probably does what you are attempting with about 1/10th (or less) code (and more importantly no errors):
> means.by.type <- with( Concrete, tapply(TStrength,
list( acc=accelerator, plas=plasticiser),
FUN=mean))
> means.by.type
plas
acc 1 2 3 4 5
1 3.594664 3.306910 3.698767 3.479103 3.333845
2 3.415150 3.131767 3.074196 3.298897 3.208397
3 3.758221 3.936236 3.780689 3.850908 4.279364
4 3.275150 3.469813 3.516808 3.324893 3.463797
Importantly, you forgot to offer str or dput on Concrete, so cannot really tell whether you problem is data-prep or coding.

Related

Finding the k-largest clusters in dbscan result

I have a dataframe df, consists of 2 columns: x and y coordinates.
Each row refers to a point.
I feed it into dbscan function to obtain the clusters of the points in df.
library("fpc")
db = fpc::dbscan(df, eps = 0.08, MinPts = 4)
plot(db, df, main = "DBSCAN", frame = FALSE)
By using print(db), I can see the result returned by dbscan.
> print(db)
dbscan Pts=13131 MinPts=4 eps=0.08
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
border 401 38 55 5 2 3 0 0 0 8 0 6 1 3 1 3 3 2 1 2 4 3
seed 0 2634 8186 35 24 561 99 7 22 26 5 75 17 9 9 54 1 2 74 21 3 15
total 401 2672 8241 40 26 564 99 7 22 34 5 81 18 12 10 57 4 4 75 23 7 18
22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44
border 4 1 2 6 2 1 3 7 2 1 2 3 11 1 3 1 3 2 5 5 1 4 3
seed 14 9 4 48 2 4 38 111 5 11 5 14 111 6 1 5 1 8 3 15 10 15 6
total 18 10 6 54 4 5 41 118 7 12 7 17 122 7 4 6 4 10 8 20 11 19 9
45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68
border 2 4 2 1 3 2 1 1 3 1 0 2 2 3 0 3 3 3 3 0 0 2 3 1
seed 15 2 9 11 4 8 12 4 6 8 7 7 3 3 4 3 3 4 2 9 4 2 1 4
total 17 6 11 12 7 10 13 5 9 9 7 9 5 6 4 6 6 7 5 9 4 4 4 5
69 70 71
border 3 3 3
seed 1 1 1
total 4 4 4
From the above summary, I can see cluster 2 consists of 8186 seed points (core points), cluster 1 consists of 2634 seed points and cluster 5 consists of 561 points.
I define the largest cluster as the one contains the largest amount of seed points. So, in this case, the largest cluster is cluster 2. And the 1st, 2nd, 3th largest clusters are 2, 1 and 5.
Are they any direct way to return the rows (points) in the largest cluster or the k-largest cluster in general?
I can do it in an indirect way.
I can obtain the assigned cluster number of each point by
db$cluster.
Hence, I can create a new dataframe df2 with db$cluster as the
new additional column besides the original x column and y
column.
Then, I can aggregate the df2 according to the cluster numbers in
the third column and find the number of points in each cluster.
After that, I can find the k-largest groups, which are 2, 1 and 5
again.
Finally, I can select the rows in df2 with third column value equals to 2 to return the points in the largest cluster.
But the above approach re-computes many known results as stated in the summary of print(db).
The dbscan function doesn't appear to retain the data.
library(fpc)
set.seed(665544)
n <- 600
df <- data.frame(x=runif(10, 0, 10)+rnorm(n, sd=0.2), y=runif(10, 0, 10)+rnorm(n,sd=0.2))
(dbs <- dbscan(df, 0.2))
#dbscan Pts=600 MinPts=5 eps=0.2
# 0 1 2 3 4 5 6 7 8 9 10 11
#border 28 4 4 8 5 3 3 4 3 4 6 4
#seed 0 50 53 51 52 51 54 54 54 53 51 1
#total 28 54 57 59 57 54 57 58 57 57 57 5
attributes(dbs)
#$names
#[1] "cluster" "eps" "MinPts" "isseed"
#$class
#[1] "dbscan"
Your indirect steps are not that indirect (only two lines needed), and these commands won't recalculate the clusters. So just run those commands, or put them in a function and then call the function in one command.
cluster_k <- function(dbs, data, k){
kth <- names(rev(sort(table(dbs$cluster)))[k])
data[dbs$cluster == kth,]
}
cluster_k(dbs=dbs, data=df, k=1)
## x y
## 3 6.580695 8.715245
## 13 6.704379 8.528486
## 23 6.809558 8.160721
## 33 6.375842 8.756433
## 43 6.603195 8.640206
## 53 6.728533 8.425067
## a data frame with 59 rows

Using tidyverse gather() to output multiple value vectors with a single key in a data frame

Despite the conventions of R, data collection and entry is for me most easily done in vertical columns. Therefore, I have a question about efficiently converting to horizontal rows with the gather() function in the tidyverse library. I find myself using gather() over and over which seems inefficient. Is there a more efficient way? And can an existing vector serve as the key? Here is an example:
Let's say we have the following health metrics on baby birds.
bird day_1_mass day_2_mass day_1_heart_rate day_3_heart_rate
1 1 5 6 60 55
2 2 6 8 62 57
3 3 3 3 45 45
Using the gather function I can reorganize the mass data into rows.
horizontal.data <- gather(vertical.data,
key = age,
value = mass,
day_1_mass:day_2_mass,
factor_key=TRUE)
Giving us
bird day_1_heart_rate day_3_heart_rate age mass
1 1 60 55 day_1_mass 5
2 2 62 57 day_1_mass 6
3 3 45 45 day_1_mass 3
4 1 60 55 day_2_mass 6
5 2 62 57 day_2_mass 8
6 3 45 45 day_2_mass 3
And use the same function again to similarly reorganize heart rate data.
horizontal.data.2 <- gather(horizontal.data,
key = age2,
value = heart_rate,
day_1_heart_rate:day_3_heart_rate,
factor_key=TRUE)
Producing a new dataframe
bird age mass age2 heart_rate
1 1 day_1_mass 5 day_1_heart_rate 60
2 2 day_1_mass 6 day_1_heart_rate 62
3 3 day_1_mass 3 day_1_heart_rate 45
4 1 day_2_mass 6 day_1_heart_rate 60
5 2 day_2_mass 8 day_1_heart_rate 62
6 3 day_2_mass 3 day_1_heart_rate 45
7 1 day_1_mass 5 day_3_heart_rate 55
8 2 day_1_mass 6 day_3_heart_rate 57
9 3 day_1_mass 3 day_3_heart_rate 45
10 1 day_2_mass 6 day_3_heart_rate 55
11 2 day_2_mass 8 day_3_heart_rate 57
12 3 day_2_mass 3 day_3_heart_rate 45
So it took two steps, but it worked. The questions are 1) Is there a way to do this in one step? and 2) Can it alternatively be done with one key (the "age" vector) that I can then simply replace as numeric data?
if I get the question right, you could do that by first gathering everything together, and then "spreading" on mass and heart rate:
library(forcats)
library(dplyr)
mass_levs <- names(vertical.data)[grep("mass", names(vertical.data))]
hearth_levs <- names(vertical.data)[grep("heart", names(vertical.data))]
horizontal.data <- vertical.data %>%
gather(variable, value, -bird, factor_key = TRUE) %>%
mutate(day = stringr::str_sub(variable, 5,5)) %>%
mutate(variable = fct_collapse(variable,
"mass" = mass_levs,
"hearth_rate" = hearth_levs)) %>%
spread(variable, value)
, giving:
bird day mass hearth_rate
1 1 1 5 60
2 1 2 6 NA
3 1 3 NA 55
4 2 1 6 62
5 2 2 8 NA
6 2 3 NA 57
7 3 1 3 45
8 3 2 3 NA
9 3 3 NA 45
we can see how it works by going through the pipe one pass at a time.
First, we gather everyting on a long format:
horizontal.data <- vertical.data %>%
gather(variable, value, -bird, factor_key = TRUE)
bird variable value
1 1 day_1_mass 5
2 2 day_1_mass 6
3 3 day_1_mass 3
4 1 day_2_mass 6
5 2 day_2_mass 8
6 3 day_2_mass 3
7 1 day_1_heart_rate 60
8 2 day_1_heart_rate 62
9 3 day_1_heart_rate 45
10 1 day_3_heart_rate 55
11 2 day_3_heart_rate 57
12 3 day_3_heart_rate 45
then, if we want to keep a "proper" long table, as the OP suggested we have to create a single key variable. In this case, it makes sense to use the day (= age). To create the day variable, we can extract it from the character strings now in variable:
%>% mutate(day = stringr::str_sub(variable, 5,5))
here, str_sub gets the substring in position 5, which is the day (note that if in the full dataset you have multiple-digits days, you'll have to tweak this a bit, probably by splitting on _):
bird variable value day
1 1 day_1_mass 5 1
2 2 day_1_mass 6 1
3 3 day_1_mass 3 1
4 1 day_2_mass 6 2
5 2 day_2_mass 8 2
6 3 day_2_mass 3 2
7 1 day_1_heart_rate 60 1
8 2 day_1_heart_rate 62 1
9 3 day_1_heart_rate 45 1
10 1 day_3_heart_rate 55 3
11 2 day_3_heart_rate 57 3
12 3 day_3_heart_rate 45 3
now, to finish we have to "spread " the table to have a mass and a heart rate column.
Here we have a problem, because currently there are 2 levels each corresponding to mass and hearth rate in the variable column. Therefore, applying spread on variable would give us again four columns.
To prevent that, we need to aggregate the four levels in variable into two levels. We can do that by using forcats::fc_collapse, by providing the association between the new level names and the "old" ones. Outside of a pipe, that would correspond to:
horizontal.data$variable <- fct_collapse(horizontal.data$variable,
mass = c("day_1_mass", "day_2_mass",
heart = c("day_1_hearth_rate", "day_3_heart_rate")
However, if you have many levels it is cumbersome to write them all. Therefore, I find beforehand the level names corresponding to the two "categories" using
mass_levs <- names(vertical.data)[grep("mass", names(vertical.data))]
hearth_levs <- names(vertical.data)[grep("heart", names(vertical.data))]
mass_levs
[1] "day_1_mass" "day_2_mass"
hearth_levs
[1] "day_1_heart_rate" "day_3_heart_rate"
therefore, the third line of the pipe can be shortened to:
%>% mutate(variable = fct_collapse(variable,
"mass" = mass_levs,
"hearth_rate" = hearth_levs))
, after which we have:
bird variable value day
1 1 mass 5 1
2 2 mass 6 1
3 3 mass 3 1
4 1 mass 6 2
5 2 mass 8 2
6 3 mass 3 2
7 1 hearth_rate 60 1
8 2 hearth_rate 62 1
9 3 hearth_rate 45 1
10 1 hearth_rate 55 3
11 2 hearth_rate 57 3
12 3 hearth_rate 45 3
, so that we are now in the condition to "spread" the table again according to variable using:
%>% spread(variable, value)
bird day mass hearth_rate
1 1 1 5 60
2 1 2 6 NA
3 1 3 NA 55
4 2 1 6 62
5 2 2 8 NA
6 2 3 NA 57
7 3 1 3 45
8 3 2 3 NA
9 3 3 NA 45
HTH
If you insist on a single command , i can give you one
setup the data.frame
c1<-c(1,2,3)
c2<-c(5,6,3)
c3<-c(6,8,3)
c4<-c(60,62,45)
c5<-c(55,57,45)
dt<-as.data.table(cbind(c1,c2,c3,c4,c5))
colnames(dt)<-c("bird","day_1_mass","day_2_mass","day_1_heart_rate","day_3_heart_rate")
Now use this single command to get the final outcome
merge(melt(dt[,c("bird","day_1_mass","day_2_mass")],id.vars = c("bird"),variable.name = "age",value.name="mass"),melt(dt[,c("bird","day_1_heart_rate","day_3_heart_rate")],id.vars = c("bird"),variable.name = "age2",value.name="heart_rate"),by = "bird")
The final outcome is
bird age mass age2 heart_rate
1: 1 day_1_mass 5 day_1_heart_rate 60
2: 1 day_1_mass 5 day_3_heart_rate 55
3: 1 day_2_mass 6 day_1_heart_rate 60
4: 1 day_2_mass 6 day_3_heart_rate 55
5: 2 day_1_mass 6 day_1_heart_rate 62
6: 2 day_1_mass 6 day_3_heart_rate 57
7: 2 day_2_mass 8 day_1_heart_rate 62
8: 2 day_2_mass 8 day_3_heart_rate 57
9: 3 day_1_mass 3 day_1_heart_rate 45
10: 3 day_1_mass 3 day_3_heart_rate 45
11: 3 day_2_mass 3 day_1_heart_rate 45
12: 3 day_2_mass 3 day_3_heart_rate 45
Though already answered, I have a different solution in which you save a list of the gather parameters you would like to run, and then run the gather_() command for each set of parameters in the list.
# Create a list of gather parameters
# Format is key, value, columns_to_gather
gather.list <- list(c("age", "mass", "day_1_mass", "day_2_mass"),
c("age2", "heart_rate", "day_1_heart_rate", "day_3_heart_rate"))
# Run gather command for each list item
for(i in gather.list){
df <- gather_(df, key_col = i[1], value_col = i[2], gather_cols = c(i[3:length(i)]), factor_key = TRUE)
}

Coloring dgCMatrix image by factor in R

I am trying to color a sparse matrix image according to a grouping factor. I know the solution is related to matrix coloring in the lattice package but I have troubles to handle it.
I have a list of hits on an app list. Every hit is related to a user and a app at a specific time.
- On the y axis are users sorted by first install of the app
Every user then has a new line for his pages hits
- On the x axis is the time
Points are hits
Here is a preview of the data:
library(Matrix)
indexUser indexInstall time
1 1 1 3
2 1 1 17
3 1 1 19
4 1 1 32
5 1 1 81
6 1 1 86
7 1 1 124
8 1 1 231
9 1 1 233
10 1 2 249
11 2 3 4
12 2 3 6
13 2 3 7
14 2 3 15
15 2 3 25
16 2 3 32
17 2 3 45
18 2 3 74
19 2 3 75
20 3 4 36
21 3 4 37
22 3 4 113
23 4 5 69
24 4 5 70
25 4 5 71
I then create a sparse matrix as the full dataset is way larger than that (10000+ x 1000)
sM <- sparseMatrix(i=dat$indexInstall, j=dat$time, x=1)
And show an image of it:
image(sM)
I want to color every lines according to the indexUser column. For example to plot user 1 in blue and all others un red
Thanks in advance

How to reverse the order of two indices of a variable in R

I have a dataset that looks like
A T Value into T A Value
1 1 32 1 1 32
1 2 33 1 2 55
1 3 34 1 3 96
2 1 55 2 1 33
2 2 56 2 2 56
2 3 57 2 3 97
3 1 96 3 1 34
3 2 97 3 2 57
3 3 98 3 3 98
and i want to use reshape (in R) to reshape this object on the left so that the T index comes in the first column and the A index in the second column to get the object on the right. I dont have the melt or cast functions.
Let df be your data.frame.
df <- df[order(df$T, df$A), c("T", "A", "Value")]
This can be found out easily by googling next time.
Looks like you just want to sort rows and move columns. If this is your sample input
tt<-read.table(text="A T Value
1 1 32
1 2 33
1 3 34
2 1 55
2 2 56
2 3 57
3 1 96
3 2 97
3 3 98", header=T)
you can do
tt[order(tt$T, tt$A), c("T","A","Value")]

transform input data

i'm struggeling with some transformation in R.
My csv file is structured like following:
User Movie Rating
1 34 4
1 55 3
1 24 5
2 55 1
2 67 5
2 24 3
and so on. And I'd like to get a matrix like this (if a user hasn't rated a movie, insert 0 as rating):
24 34 55 67
5 4 3 0
3 0 1 5
where each row is a single user and the columns are movies. So each entry is a rating for a movie. I'm wondering if there is a simple solution in R after i've read in the csv above. I try to do a workaround with python at the moment...
Thanks alot.
Regards
> inp <- read.table(text="User Movie Rating
+ 1 34 4
+ 1 55 3
+ 1 24 5
+ 2 55 1
+ 2 67 5
+ 2 24 3
+ ", header=TRUE)
> xtabs(Rating ~ User+Movie, data=inp)
Movie
User 24 34 55 67
1 5 4 3 0
2 3 0 1 5

Resources