I want to calculate the difference in values for the following row after the previous. However, I am getting this error:
Error in mutate():
! Problem while computing ..1 = across(where(is.numeric), diff).
ℹ The error occurred in group 1: vs = 0
Caused by error in across():
! Problem while computing column mpg.
Caused by error in dplyr_internal_error():
Run rlang::last_error() to see where the error occurred.
Here is what I have tried:
mtcars %>% group_by(vs) %>% mutate(across(where(is.numeric), diff))
This seems to do the trick:
mtcars %>% group_by(vs) %>% aggregate(. ~ vs, data=., diff) %>% as.data.frame() %>% unnest()
#//--
# A tibble: 30 × 11
vs mpg cyl disp hp drat wt qsec am gear carb
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 0 0 0 0 0 0 0.255 0.560 0 0 0
2 0 -2.3 2 200 65 -0.75 0.565 0 -1 -1 -2
3 0 -4.4 0 0 70 0.0600 0.130 -1.18 0 0 2
4 0 2.10 0 -84.2 -65 -0.140 0.500 1.56 0 0 -1
5 0 0.900 0 0 0 0 -0.340 0.200 0 0 0
6 0 -2.10 0 0 0 0 0.0500 0.400 0 0 0
7 0 -4.8 0 196. 25 -0.140 1.47 -0.0200 0 0 1
8 0 0 0 -12 10 0.0700 0.174 -0.160 0 0 0
9 0 4.3 0 -20 15 0.23 -0.0790 -0.400 0 0 0
10 0 0.800 0 -122 -80 -0.47 -1.82 -0.550 0 0 -2
# … with 20 more rows
You could explicitly define the calculation using lag. Or you could do this in base R:
library(tidyverse)
#tidyverse
mtcars %>%
group_by(vs) %>%
mutate(across(where(is.numeric), ~.-lag(., default = first(.)))) |>
arrange(vs)
#> # A tibble: 32 x 11
#> # Groups: vs [2]
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 0 0 0 0 0 0 0 0 0 0 0
#> 2 0 0 0 0 0 0.255 0.560 0 0 0 0
#> 3 -2.3 2 200 65 -0.75 0.565 0 0 -1 -1 -2
#> 4 -4.4 0 0 70 0.0600 0.130 -1.18 0 0 0 2
#> 5 2.10 0 -84.2 -65 -0.140 0.500 1.56 0 0 0 -1
#> 6 0.900 0 0 0 0 -0.340 0.200 0 0 0 0
#> 7 -2.10 0 0 0 0 0.0500 0.400 0 0 0 0
#> 8 -4.8 0 196. 25 -0.140 1.47 -0.0200 0 0 0 1
#> 9 0 0 -12 10 0.0700 0.174 -0.160 0 0 0 0
#> 10 4.3 0 -20 15 0.23 -0.0790 -0.400 0 0 0 0
#> # ... with 22 more rows
#base R
by(mtcars, mtcars$vs, \(x) apply(x, 2, diff)) |>
do.call(what = rbind.data.frame)
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> 0.Mazda RX4 Wag 0.0 0 0.0 0 0.00 0.255 0.56 0 0 0 0
#> 0.Hornet Sportabout -2.3 2 200.0 65 -0.75 0.565 0.00 0 -1 -1 -2
#> 0.Duster 360 -4.4 0 0.0 70 0.06 0.130 -1.18 0 0 0 2
#> 0.Merc 450SE 2.1 0 -84.2 -65 -0.14 0.500 1.56 0 0 0 -1
#> 0.Merc 450SL 0.9 0 0.0 0 0.00 -0.340 0.20 0 0 0 0
#> 0.Merc 450SLC -2.1 0 0.0 0 0.00 0.050 0.40 0 0 0 0
#> 0.Cadillac Fleetwood -4.8 0 196.2 25 -0.14 1.470 -0.02 0 0 0 1
#> 0.Lincoln Continental 0.0 0 -12.0 10 0.07 0.174 -0.16 0 0 0 0
#> 0.Chrysler Imperial 4.3 0 -20.0 15 0.23 -0.079 -0.40 0 0 0 0
#> 0.Dodge Challenger 0.8 0 -122.0 -80 -0.47 -1.825 -0.55 0 0 0 -2
#> 0.AMC Javelin -0.3 0 -14.0 0 0.39 -0.085 0.43 0 0 0 0
#> 0.Camaro Z28 -1.9 0 46.0 95 0.58 0.405 -1.89 0 0 0 2
#> 0.Pontiac Firebird 5.9 0 50.0 -70 -0.65 0.005 1.64 0 0 0 -2
#> 0.Porsche 914-2 6.8 -4 -279.7 -84 1.35 -1.705 -0.35 0 1 2 0
#> 0.Ford Pantera L -10.2 4 230.7 173 -0.21 1.030 -2.20 0 0 0 2
#> 0.Ferrari Dino 3.9 -2 -206.0 -89 -0.60 -0.400 1.00 0 0 0 2
#> 0.Maserati Bora -4.7 2 156.0 160 -0.08 0.800 -0.90 0 0 0 2
#> 1.Hornet 4 Drive -1.4 2 150.0 17 -0.77 0.895 0.83 0 -1 -1 0
#> 1.Valiant -3.3 0 -33.0 -5 -0.32 0.245 0.78 0 0 0 0
#> 1.Merc 240D 6.3 -2 -78.3 -43 0.93 -0.270 -0.22 0 0 1 1
#> 1.Merc 230 -1.6 0 -5.9 33 0.23 -0.040 2.90 0 0 0 0
#> 1.Merc 280 -3.6 2 26.8 28 0.00 0.290 -4.60 0 0 0 2
#> 1.Merc 280C -1.4 0 0.0 0 0.00 0.000 0.60 0 0 0 0
#> 1.Fiat 128 14.6 -2 -88.9 -57 0.16 -1.240 0.57 0 1 0 -3
#> 1.Honda Civic -2.0 0 -3.0 -14 0.85 -0.585 -0.95 0 0 0 1
#> 1.Toyota Corolla 3.5 0 -4.6 13 -0.71 0.220 1.38 0 0 0 -1
#> 1.Toyota Corona -12.4 0 49.0 32 -0.52 0.630 0.11 0 -1 -1 0
#> 1.Fiat X1-9 5.8 0 -41.1 -31 0.38 -0.530 -1.11 0 1 1 0
#> 1.Lotus Europa 3.1 0 16.1 47 -0.31 -0.422 -2.00 0 0 1 1
#> 1.Volvo 142E -9.0 0 25.9 -4 0.34 1.267 1.70 0 0 -1 0
My dataframe contains about 60 observations and 15 variables. They are mixture of continuous and binary data but I've made sure all of the variables are numeric and do not have any NA values (used na.omit). I used is.finite, is.na to check for infinite/NA values. Using the function prcomp() on my dataframe tells me "Infinite or missing values in x". What might I be overlooking then? I am new to R, just started learning so i appreciate the help.
enter image description here
Some of my columns are still characters and I am not sure how to change it, using gsub and then as.numeric still gives me an error.
library(readxl)
> Pca_for_R <- read_excel("~/Pca for R.xlsx")
sapply(Pca_for_R, as.numeric)
new <- gsub(",", "", Pca_for_R)
Mypca <- prcomp(Pca_for_R, center=TRUE, scale=TRUE)
List item
Sample of my data:
CRT GFA MRT VA Contast Myp Hyp eso exo VFV VFH
[1,] 247 2.71 1283 0.63 2.50 0 0 1 0 0 0
[2,] 226 2.06 442 1.00 1.50 0 0 0 0 0 0
[3,] 251 2.16 420 1.00 1.25 0 0 0 0 0 0
[4,] 202 3.02 282 0.80 1.25 0 1 0 0 0 0
[5,] 252 2.17 640 0.50 1.50 0 0 0 0 0 0
[6,] 260 2.25 857 0.40 1.50 0 1 0 0 1 0
[7,] 255 2.51 736 0.63 1.20 0 1 1 0 0 0
[8,] 242 1.90 353 1.00 1.20 0 0 1 0 0 0
[9,] 206 1.90 292 0.80 1.20 1 0 0 0 0 0
[10,] 515 3.04 376 0.25 1.20 0 0 0 0 1 0
[11,] 222 2.13 424 0.80 10.00 0 1 0 0 1 0
[12,] 292 1.70 326 0.50 1.25 0 1 0 0 0 0
[13,] 207 2.55 427 1.00 2.50 0 1 0 0 0 0
[14,] 242 1.89 387 0.63 1.20 0 0 0 0 0 0
[15,] 205 1.86 341 1.00 2.50 0 0 0 1 0 0
[16,] 250 3.01 728 0.40 1.20 1 0 0 0 0 0
[17,] 269 3.51 410 0.50 6.00 1 0 0 0 0 1
[18,] 271 2.17 592 0.63 1.20 1 0 0 1 0 0
[19,] 264 1.52 235 0.63 1.20 0 0 0 0 0 1
[20,] 381 4.63 628 0.80 1.25 0 0 1 0 0 0
[21,] 342 3.35 422 0.30 2.50 0 0 1 0 0 0
[22,] 219 3.75 372 0.40 1.50 1 0 1 0 0 0
[23,] 306 3.35 564 0.40 3.00 0 0 0 0 0 0
[24,] 253 3.94 592 0.63 1.50 0 1 0 1 0 0
[25,] 268 2.13 387 1.00 1.25 0 0 0 0 0 0
[26,] 346 2.16 345 0.50 2.50 0 1 0 1 0 0
[27,] 289 1.79 370 0.50 1.50 0 0 0 0 0 0
[28,] 362 1.91 616 1.00 2.50 1 0 0 0 0 1
[29,] 321 3.65 791 0.50 5.00 0 0 0 0 0 0
[30,] 497 2.64 516 0.80 5.00 0 0 0 0 0 0
[31,] 291 2.52 900 1.00 5.00 0 0 0 0 0 0
[32,] 176 2.94 376 1.00 1.20 0 1 0 1 0 0
[33,] 192 2.00 336 0.32 2.00 0 1 0 0 0 0
[34,] 207 2.05 340 1.00 1.20 0 1 0 0 0 0
[35,] 331 2.05 480 0.80 1.20 0 1 0 0 0 0
[36,] 238 2.33 550 1.00 1.50 0 1 0 0 0 0
[37,] 205 4.32 554 0.63 5.00 0 1 0 0 0 0
[38,] 300 1.55 499 1.00 2.50 1 0 0 0 0 0
[39,] 374 2.92 687 1.00 5.00 0 0 0 0 0 0
[40,] 243 3.43 735 0.40 2.50 0 0 0 1 0 0
[41,] 221 2.39 489 0.50 1.25 0 0 0 0 0 0
[42,] 177 1.88 249 1.25 1.25 0 0 0 0 0 0
[43,] 377 3.35 581 0.50 5.00 0 0 0 0 0 0
[44,] 285 2.28 459 0.30 25.00 0 0 0 0 0 0
[45,] 230 2.17 438 1.00 1.80 0 1 0 0 0 0
[46,] 183 2.34 344 1.00 1.80 0 1 1 0 1 0
[47,] 245 1.63 418 0.50 1.25 0 1 1 0 0 0
[48,] 235 1.89 514 0.60 4.00 0 0 0 0 0 0
[49,] 179 2.89 525 0.30 4.00 1 0 1 0 0 0
[50,] 187 1.47 313 0.16 5.00 0 1 0 0 0 0
[51,] 243 2.48 331 0.63 3.00 1 0 0 0 1 0
[52,] 289 1.79 370 0.80 1.50 0 0 0 0 0 0
[53,] 287 2.80 569 0.60 6.00 0 1 0 1 0 0
[54,] 271 1.61 337 0.80 1.65 0 0 1 0 0 0
[55,] 198 1.70 429 0.80 1.25 0 0 0 0 0 0
[56,] 246 2.65 516 0.50 5.00 1 0 1 0 0 0
[57,] 318 2.16 746 0.25 8.00 0 0 0 1 0 0
[58,] 238 1.61 355 0.80 1.25 0 0 0 1 0 0
[59,] 268 2.13 387 0.32 1.50 0 0 0 0 0 1
[60,] 272 2.41 406 0.80 1.25 0 1 1 0 0 0
I am trying to create a bar graph showing the count of some variables I grouped from a table using quintiles. I want this bargraph to be segmented by gender (it's in the data).
So it should look show me the prejudice cut and how often it is showcased in the data segmented by the gender.
prejiducequint = quantile(dta.sub2$WHPREJUDICE6R,seq(0,1, length = 5))
prejiducecut = cut(dta.sub2$WHPREJUDICE6R, breaks = incomequint)
p = ggplot(dta.sub2, aes(prejiducecut, fill = FEMALE6)) +
geom_bar(position = "dodge")
p
here is the data
WHPREJUDICE6R prejiducecut FEMALE6
1 -0.005 (-0.0125,0] 0
2 0.075 (0.0275,1] 0
3 0.000 (-0.0125,0] 0
4 0.000 (-0.0125,0] 1
5 -0.020 (-0.44,-0.0125] 1
6 0.130 (0.0275,1] 1
7 0.090 (0.0275,1] 0
8 0.230 (0.0275,1] 0
9 0.100 (0.0275,1] 1
10 0.870 (0.0275,1] 1
11 -0.130 (-0.44,-0.0125] 1
12 -0.010 (-0.0125,0] 1
13 0.005 (0,0.0275] 0
14 -0.010 (-0.0125,0] 1
15 -0.060 (-0.44,-0.0125] 1
16 0.000 (-0.0125,0] 1
17 -0.010 (-0.0125,0] 1
18 0.000 (-0.0125,0] 1
19 -0.075 (-0.44,-0.0125] 1
20 0.005 (0,0.0275] 1
21 -0.010 (-0.0125,0] 0
22 -0.060 (-0.44,-0.0125] 1
23 0.500 (0.0275,1] 0
24 0.020 (0,0.0275] 0
25 0.135 (0.0275,1] 0
26 -0.055 (-0.44,-0.0125] 1
27 -0.440 <NA> 0
28 0.000 (-0.0125,0] 0
29 -0.065 (-0.44,-0.0125] 0
30 0.000 (-0.0125,0] 1
31 0.035 (0.0275,1] 1
32 -0.005 (-0.0125,0] 0
33 0.000 (-0.0125,0] 1
34 -0.290 (-0.44,-0.0125] 1
35 0.005 (0,0.0275] 1
36 0.300 (0.0275,1] 0
37 0.005 (0,0.0275] 1
38 0.070 (0.0275,1] 1
39 -0.195 (-0.44,-0.0125] 1
40 -0.260 (-0.44,-0.0125] 0
41 -0.040 (-0.44,-0.0125] 1
42 0.720 (0.0275,1] 0
43 0.045 (0.0275,1] 1
44 0.125 (0.0275,1] 1
45 0.035 (0.0275,1] 0
46 0.005 (0,0.0275] 1
47 0.000 (-0.0125,0] 0
48 0.000 (-0.0125,0] 0
49 0.000 (-0.0125,0] 1
50 0.010 (0,0.0275] 1
51 0.495 (0.0275,1] 1
52 0.000 (-0.0125,0] 1
53 0.000 (-0.0125,0] 1
54 0.010 (0,0.0275] 0
55 -0.015 (-0.44,-0.0125] 1
56 -0.110 (-0.44,-0.0125] 0
57 0.000 (-0.0125,0] 0
58 0.065 (0.0275,1] 1
59 0.255 (0.0275,1] 1
60 -0.020 (-0.44,-0.0125] 1
61 0.070 (0.0275,1] 1
62 0.000 (-0.0125,0] 0
63 1.000 (0.0275,1] 0
64 0.000 (-0.0125,0] 1
65 0.490 (0.0275,1] 0
66 -0.005 (-0.0125,0] 1
67 0.000 (-0.0125,0] 0
68 0.010 (0,0.0275] 1
69 0.000 (-0.0125,0] 1
70 -0.065 (-0.44,-0.0125] 1
71 0.005 (0,0.0275] 0
72 -0.065 (-0.44,-0.0125] 0
73 0.060 (0.0275,1] 0
74 0.000 (-0.0125,0] 0
75 0.000 (-0.0125,0] 1
76 0.155 (0.0275,1] 0
77 -0.190 (-0.44,-0.0125] 0
78 0.000 (-0.0125,0] 0
79 -0.065 (-0.44,-0.0125] 0
80 0.005 (0,0.0275] 1
81 0.060 (0.0275,1] 0
82 -0.100 (-0.44,-0.0125] 1
83 0.000 (-0.0125,0] 1
84 0.005 (0,0.0275] 0
85 0.000 (-0.0125,0] 1
86 0.300 (0.0275,1] 1
87 -0.070 (-0.44,-0.0125] 1
88 0.430 (0.0275,1] 0
89 -0.060 (-0.44,-0.0125] 1
90 -0.005 (-0.0125,0] 1
91 0.000 (-0.0125,0] 1
92 -0.005 (-0.0125,0] 1
93 0.015 (0,0.0275] 0
94 -0.205 (-0.44,-0.0125] 0
95 0.000 (-0.0125,0] 1
96 0.045 (0.0275,1] 0
97 -0.075 (-0.44,-0.0125] 0
98 0.000 (-0.0125,0] 0
99 0.000 (-0.0125,0] 1
100 0.000 (-0.0125,0] 1
101 0.235 (0.0275,1] 0
102 -0.060 (-0.44,-0.0125] 1
103 0.505 (0.0275,1] 0
104 -0.185 (-0.44,-0.0125] 1
105 0.185 (0.0275,1] 0
106 -0.115 (-0.44,-0.0125] 0
107 0.005 (0,0.0275] 0
108 -0.440 <NA> 1
109 -0.100 (-0.44,-0.0125] 1
110 0.430 (0.0275,1] 1
111 -0.005 (-0.0125,0] 0
112 0.000 (-0.0125,0] 1
113 0.000 (-0.0125,0] 0
114 -0.120 (-0.44,-0.0125] 1
115 0.005 (0,0.0275] 0
116 0.145 (0.0275,1] 0
117 0.110 (0.0275,1] 0
118 -0.010 (-0.0125,0] 0
119 0.000 (-0.0125,0] 1
120 -0.005 (-0.0125,0] 0
121 -0.060 (-0.44,-0.0125] 0
122 0.120 (0.0275,1] 1
123 -0.240 (-0.44,-0.0125] 0
124 -0.005 (-0.0125,0] 0
125 0.000 (-0.0125,0] 1
126 -0.060 (-0.44,-0.0125] 0
127 -0.305 (-0.44,-0.0125] 1
128 0.050 (0.0275,1] 0
129 0.000 (-0.0125,0] 0
130 -0.005 (-0.0125,0] 0
131 -0.005 (-0.0125,0] 0
132 0.000 (-0.0125,0] 1
133 -0.045 (-0.44,-0.0125] 1
134 -0.005 (-0.0125,0] 0
135 0.000 (-0.0125,0] 1
136 -0.065 (-0.44,-0.0125] 1
137 0.000 (-0.0125,0] 1
138 0.055 (0.0275,1] 0
139 0.020 (0,0.0275] 1
140 0.000 (-0.0125,0] 0
141 0.000 (-0.0125,0] 0
142 -0.005 (-0.0125,0] 1
143 0.005 (0,0.0275] 1
The graph gets made, however, there is segmentation w/ gender (FEMALE6). Female 6 is a variable with either 0 or 1 based on the persons gender.
I'm attempting to apply the km function from the DiceKriging package to a multivariate dataset. When I use my entire dataset (dimensions = [938,13]), the algorithm runs without problem. When I use a smaller subset of this dataset (dimensions = [94,13]), I encounter this error:
Error in t.default(T) : argument is not a matrix
I'm not sure what's causing this error to occur when analyzing the data subset. For reference, here's the data and code I'm running:
g<-km(design=sub.ev,response=sub.rv,covtype="matern5_2")
> sub.ev
1 -0.1519272795 1 0 0 27.769 27.45715 8.02 7.330 21.16 17.73979 0.000 4119.280 0.5338750
2 -0.1436123857 1 0 0 27.420 27.45715 8.04 6.895 20.19 17.73979 0.000 4119.280 0.5338750
3 -0.1144934053 1 0 0 27.021 27.45715 8.06 6.690 19.54 17.73979 0.000 4119.280 0.5338750
4 -0.1585420923 1 0 0 26.531 27.45715 8.09 6.160 18.64 17.73979 0.000 4119.280 0.5338750
5 -0.0588867480 1 0 0 26.107 27.45715 8.11 5.790 17.71 17.73979 0.000 4119.280 0.5338750
6 -0.0402306730 1 0 0 25.704 27.45715 8.15 5.840 17.26 17.73979 0.000 4119.280 0.5338750
7 0.0161781773 1 0 0 25.265 27.45715 8.18 5.390 16.77 17.73979 0.000 4119.280 0.5338750
8 0.0660891620 1 0 0 24.967 27.45715 8.21 5.425 16.18 17.73979 0.000 4119.280 0.5338750
9 0.0079745010 1 0 0 24.665 27.45715 8.24 5.250 16.51 17.73979 0.000 4119.280 0.5338750
10 -0.0191421967 1 0 0 24.416 27.45715 8.27 5.090 15.46 17.73979 0.000 4119.280 0.5338750
11 0.0364618430 1 0 0 24.232 27.45715 8.29 4.820 14.47 17.73979 0.000 4119.280 0.5338750
12 0.0160652203 1 0 0 24.077 27.45715 8.31 4.745 15.57 17.73979 0.000 4119.280 0.5338750
13 0.0763233707 1 0 0 23.906 27.45715 8.33 4.680 15.14 17.73979 0.000 4119.280 0.5338750
14 0.0211064293 1 0 0 23.794 27.45715 8.35 4.940 18.25 17.73979 0.000 4119.280 0.5338750
15 0.0166228227 1 0 0 23.642 27.45715 8.36 5.190 14.60 17.73979 0.000 4119.280 0.5338750
16 0.0022511447 1 0 0 23.531 27.45715 8.37 5.425 15.30 17.73979 0.000 4119.280 0.5338750
17 -0.0122220320 1 0 0 23.445 27.45715 8.39 6.110 14.79 17.73979 0.000 4119.280 0.5338750
18 -0.0057961107 1 0 0 23.442 27.45715 8.39 5.940 14.49 17.73979 0.000 4119.280 0.5338750
19 -0.0322631907 1 0 0 23.512 27.45715 8.39 5.380 14.38 17.73979 0.000 4119.280 0.5338750
95 0.0159845000 1 0 0 27.306 27.45715 8.24 7.380 17.14 17.73979 0.000 4119.280 0.5338750
96 -0.0437959553 1 0 0 27.054 27.45715 8.25 7.100 15.35 17.73979 0.000 4119.280 0.5338750
97 -0.0805486920 1 0 0 26.520 26.88009 8.27 6.900 14.87 16.96688 0.000 3725.365 0.5422500
98 -0.0941385073 1 0 0 26.306 26.88009 8.30 6.725 15.19 16.96688 0.000 3725.365 0.5422500
99 -0.1159254400 1 0 0 26.039 26.88009 8.32 6.590 15.42 16.96688 0.000 3725.365 0.5422500
100 -0.0362266430 1 0 0 25.637 26.88009 8.36 6.280 14.34 16.96688 0.000 3725.365 0.5422500
101 0.0326682983 1 0 0 25.299 26.88009 8.38 6.100 14.36 16.96688 0.000 3725.365 0.5422500
102 0.0471005793 1 0 0 24.742 26.88009 8.44 5.955 13.48 16.96688 0.000 3725.365 0.5422500
103 -0.0596346010 1 0 0 24.262 26.88009 8.49 5.530 12.91 16.96688 0.000 3725.365 0.5422500
104 -0.0043536683 1 0 0 23.856 26.88009 8.52 5.315 12.80 16.96688 0.000 3725.365 0.5422500
105 -0.0177714297 1 0 0 23.578 26.88009 8.54 5.240 13.01 16.96688 0.000 3725.365 0.5422500
106 0.0169363000 1 0 0 23.313 26.88009 8.56 5.225 12.69 16.96688 0.000 3725.365 0.5422500
107 -0.0170451183 1 0 0 23.023 26.88009 8.59 5.090 12.71 16.96688 0.000 3725.365 0.5422500
108 0.0231896353 1 0 0 22.755 26.88009 8.62 5.280 11.58 16.96688 0.000 3725.365 0.5422500
109 0.0053651757 1 0 0 22.510 26.88009 8.65 5.690 11.28 16.96688 0.000 3725.365 0.5422500
110 0.0281674793 1 0 0 22.342 26.88009 8.68 5.555 11.21 16.96688 0.000 3725.365 0.5422500
111 0.0009483843 1 0 0 22.168 26.88009 8.70 5.770 10.68 16.96688 0.000 3725.365 0.5422500
112 0.0147559413 1 0 0 22.151 26.88009 8.69 5.995 11.57 16.96688 0.000 3725.365 0.5422500
190 -0.0115338953 1 0 0 27.062 25.91990 8.21 8.000 15.35 14.07031 0.000 3949.390 0.5342917
191 -0.0189870410 1 0 0 26.545 25.91990 8.22 7.870 14.21 14.07031 0.000 3949.390 0.5342917
192 -0.0184237180 1 0 0 26.104 25.91990 8.25 7.795 20.04 14.07031 0.000 3949.390 0.5342917
193 -0.0319295797 1 0 0 25.859 25.91990 8.26 7.730 13.14 14.07031 0.000 3949.390 0.5342917
194 -0.0184753123 1 0 0 25.573 25.91990 8.30 7.585 12.93 14.07031 0.000 3949.390 0.5342917
195 -0.0197481060 1 0 0 25.005 25.91990 8.36 7.490 11.29 14.07031 0.000 3949.390 0.5342917
196 -0.0215467360 1 0 0 24.710 25.91990 8.39 7.245 11.10 14.07031 0.000 3949.390 0.5342917
197 0.0265223447 1 0 0 24.455 25.91990 8.42 7.240 11.44 14.07031 0.000 3949.390 0.5342917
198 0.0470763840 1 0 0 24.087 25.91990 8.45 7.225 11.42 14.07031 0.000 3949.390 0.5342917
199 0.0622169450 1 0 0 23.673 25.91990 8.48 7.260 11.20 14.07031 0.000 3949.390 0.5342917
200 -0.0104582193 1 0 0 23.301 25.91990 8.52 7.190 11.57 14.07031 0.000 3949.390 0.5342917
201 0.0121972077 1 0 0 23.005 25.91990 8.55 7.220 10.71 14.07031 0.000 3949.390 0.5342917
202 0.0219721027 1 0 0 22.745 25.91990 8.59 7.245 10.86 14.07031 0.000 3949.390 0.5342917
203 0.0208879210 1 0 0 22.576 25.91990 8.61 7.180 10.62 14.07031 0.000 3949.390 0.5342917
204 0.0192644400 1 0 0 22.417 25.91990 8.63 7.220 10.47 14.07031 0.000 3949.390 0.5342917
205 0.0066226250 1 0 0 22.243 25.91990 8.65 7.170 10.29 14.07031 0.000 3949.390 0.5342917
206 0.0036012053 1 0 0 22.136 25.91990 8.67 7.200 10.86 14.07031 0.000 3949.390 0.5342917
207 -0.0027906963 1 0 0 22.012 25.91990 8.69 7.150 10.57 14.07031 0.000 3949.390 0.5342917
290 0.2266387893 1 0 0 27.763 26.21360 8.21 7.765 23.37 17.97479 0.000 3554.510 0.5938333
291 0.1151646527 1 0 0 27.377 26.21360 8.23 7.720 24.43 17.97479 0.000 3554.510 0.5938333
292 -0.0218444193 1 0 0 27.285 26.21360 8.23 7.655 23.40 17.97479 0.000 3554.510 0.5938333
293 -0.0908422353 1 0 0 26.935 26.21360 8.26 7.590 21.96 17.97479 0.000 3554.510 0.5938333
294 -0.1716709177 1 0 0 26.792 26.21360 8.28 7.540 23.30 17.97479 0.000 3554.510 0.5938333
295 -0.1943201847 1 0 0 26.869 26.21360 8.28 7.420 25.06 17.97479 0.000 3554.510 0.5938333
296 -0.1529985130 1 0 0 26.733 26.21360 8.28 7.310 33.62 17.97479 0.000 3554.510 0.5938333
297 -0.1106344563 1 0 0 26.394 26.21360 8.29 7.160 29.06 17.97479 0.000 3554.510 0.5938333
298 -0.0638089193 1 0 0 25.973 26.21360 8.32 7.010 23.25 17.97479 0.000 3554.510 0.5938333
299 -0.1208449610 1 0 0 25.501 26.21360 8.37 6.970 21.10 17.97479 0.000 3554.510 0.5938333
300 -0.2310616323 1 0 0 25.192 26.21360 8.40 7.010 19.16 17.97479 0.000 3554.510 0.5938333
301 -0.2043969970 1 0 0 24.867 26.21360 8.43 7.030 17.80 17.97479 0.000 3554.510 0.5938333
302 -0.2003585363 1 0 0 24.634 26.21360 8.45 7.080 17.01 17.97479 0.000 3554.510 0.5938333
303 -0.2535806687 1 0 0 24.468 26.21360 8.47 7.130 16.41 17.97479 0.000 3554.510 0.5938333
304 -0.2464920640 1 0 0 24.234 26.21360 8.50 7.055 15.50 17.97479 0.000 3554.510 0.5938333
389 0.2461277410 1 0 0 27.981 26.78711 8.29 6.760 20.65 22.02125 0.000 3589.260 0.7715417
390 0.1660650063 1 0 0 27.915 26.78711 8.29 6.575 22.44 22.02125 0.000 3589.260 0.7715417
391 -0.0609562143 1 0 0 27.757 26.78711 8.30 6.580 22.92 22.02125 0.000 3589.260 0.7715417
392 -0.2020911323 1 0 0 27.527 26.78711 8.32 6.590 22.91 22.02125 0.000 3589.260 0.7715417
393 -0.2980735343 1 0 0 27.563 26.78711 8.31 6.510 24.30 22.02125 0.000 3589.260 0.7715417
394 -0.2370078227 1 0 0 27.400 26.78711 8.32 6.525 22.67 22.02125 0.000 3589.260 0.7715417
395 -0.3440380117 1 0 0 27.274 26.78711 8.32 6.530 22.45 22.02125 0.000 3589.260 0.7715417
396 -0.0057092573 1 0 0 27.132 26.78711 8.30 6.490 22.08 22.02125 0.000 3589.260 0.7715417
397 -0.0241878650 1 0 0 27.018 26.78711 8.30 6.450 24.00 22.02125 0.000 3589.260 0.7715417
398 -0.2080665820 1 0 0 26.834 26.78711 8.32 6.405 25.81 22.02125 0.000 3589.260 0.7715417
399 -0.1716383953 1 0 0 26.637 26.78711 8.33 6.340 25.19 22.02125 0.000 3589.260 0.7715417
400 -0.2570107420 1 0 0 26.476 26.78711 8.34 6.300 22.05 22.02125 0.000 3589.260 0.7715417
495 -0.1302196527 1 0 0 25.967 26.53893 8.40 6.300 11.90 15.57448 2.286 2313.087 0.6087083
496 -0.0391473870 1 0 0 25.708 26.53893 8.41 6.260 10.77 15.57448 2.286 2313.087 0.6087083
587 -0.0382961500 1 0 0 28.647 26.98353 8.28 8.110 17.52 12.12667 0.000 3980.156 0.4227083
588 -0.0035965477 1 0 0 28.652 26.98353 8.26 7.920 21.14 12.12667 0.000 3980.156 0.4227083
589 -0.0050414577 1 0 0 28.307 26.98353 8.27 7.830 18.39 12.12667 0.000 3980.156 0.4227083
590 0.0354186967 1 0 0 27.896 26.98353 8.27 6.825 17.35 12.12667 0.000 3980.156 0.4227083
591 -0.0676664363 1 0 0 27.581 26.98353 8.27 6.110 16.06 12.12667 0.000 3980.156 0.4227083
592 -0.1723716683 1 0 0 27.223 26.98353 8.29 6.245 15.14 12.12667 0.000 3980.156 0.4227083
688 0.1598606430 1 0 0 28.738 27.54833 8.35 5.965 24.99 15.51010 0.000 2554.804 0.4730417
912 -0.1977195740 1 0 0 26.653 27.24447 9.09 7.635 11.38 12.71729 0.000 4002.945 0.5135417
913 -0.1306465143 1 0 0 27.399 27.24447 9.07 7.420 12.84 12.71729 0.000 4002.945 0.5135417
914 -0.1175953210 1 0 0 28.024 27.24447 9.01 7.270 13.91 12.71729 0.000 4002.945 0.5135417
> sub.rv
[1] 3.24 2.65 2.18 1.75 1.52 1.32 1.25 1.18 1.16 1.10 1.05 1.01 0.95 0.88 0.83 0.76 0.72 0.61 0.58 3.64 3.46 2.97 2.64 2.27 2.07 1.72 1.33 1.03 0.89 0.82 0.69 0.64 0.60
[34] 0.55 0.44 0.33 0.26 4.65 4.10 3.55 3.30 2.70 2.12 1.74 1.53 1.27 1.02 0.88 0.76 0.66 0.58 0.54 0.47 0.40 0.28 5.87 5.24 5.09 4.22 3.90 3.94 3.41 2.75 2.18 1.86 1.67
[67] 1.52 1.42 1.34 1.35 5.88 5.56 4.88 3.98 3.76 3.31 3.11 2.78 2.41 1.97 1.65 1.39 1.63 1.44 4.08 4.83 3.84 3.23 2.83 2.36 4.33 1.86 3.17 4.48
I've run into the same issue. It is resolved when you remove the variables in your design matrix that have 0 variance (as these are simply constants). In your case, you need to remove columns (3:5).
this is my dataframe a:
ui 194635691 194153563 177382028 177382031 195129144 196972549 196258704 194907960 196950156 194139014 153444738
1 56320e0e55e89c3e14e26d3d 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.000 0 0
2 563734c3b65dd40e340eaa56 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.000 0 0
3 563e12657d4c410c5832579c 0.00 0.00 0.01 0.01 0.00 0.00 0.00 0.00 0.000 0 0
4 565181854c24b410e4891e11 0.00 0.00 0.00 0.00 0.00 0.01 0.00 0.00 0.000 0 0
5 5651b53fec231f1df8482d23 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.027 0 0
6 56548df4b84c321fe4cdfb8f 0.00 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.000 0 0
7 56549946735e782a885957e6 0.00 0.00 0.00 0.00 0.08 0.00 0.00 0.00 0.000 0 0
8 56549f9bb84c321fe4ce7a37 0.00 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.000 0 0
9 5654a35a735e782a8859a053 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.00 0.000 0 0
What I want to do here is calculate the cosine similarity between useridvector
and each row of data frame a but without first column(ui)
I have tried following code so far:
user_id=actions_slippers$ui[i]#user_id is coming from another dataframe called action_slippers
useridvector=a[a$ui %in% user_id, ]
p=as.vector(cosine(t(a[,2:ncol(a)]))[,1])# this measures cosine similarity between first row of dataframe a and each other of rows from dataframe a
but I want to calculate cosine similarity between useridvector and each row of dataframe a without first column.
useridvector looks like this:
ui 194635691 194153563 177382028 177382031 195129144 196972549 196258704 194907960 196950156 194139014 153444738
5651b53fec231f1df8482d23 0 0 0 0 0 0 0 0 0.027 0 0
Can anyone tell me how to do this?
cosine{lsa} works. I'd like to share my try.
suppose you save the data in a dataframe like:
> data
ui X194635691 X194153563 X177382028 X177382031 X195129144 X196972549 X196258704 X194907960 X196950156 X194139014 X153444738
1 56320e0e55e89c3e14e26d3d 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.000 0 0
2 563734c3b65dd40e340eaa56 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.000 0 0
3 563e12657d4c410c5832579c 0.00 0.00 0.01 0.01 0.00 0.00 0.00 0.00 0.000 0 0
4 565181854c24b410e4891e11 0.00 0.00 0.00 0.00 0.00 0.01 0.00 0.00 0.000 0 0
5 5651b53fec231f1df8482d23 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.027 0 0
6 56548df4b84c321fe4cdfb8f 0.00 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.000 0 0
7 56549946735e782a885957e6 0.00 0.00 0.00 0.00 0.08 0.00 0.00 0.00 0.000 0 0
8 56549f9bb84c321fe4ce7a37 0.00 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.000 0 0
9 5654a35a735e782a8859a053 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.00 0.000 0 0
Using data[,-1] or subset.data.frame(data, select = names(data)[-1] to eliminate the first column,then convert to matrix and use the cosine{lsa}
> res <- lsa::cosine(t(as.matrix(data[, -1])))
> res
[,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9]
[1,] 1 0 0 0 0 0 0 0 0
[2,] 0 1 0 0 0 0 0 0 0
[3,] 0 0 1 0 0 0 0 0 0
[4,] 0 0 0 1 0 0 0 0 0
[5,] 0 0 0 0 1 0 0 0 0
[6,] 0 0 0 0 0 1 0 1 0
[7,] 0 0 0 0 0 0 1 0 0
[8,] 0 0 0 0 0 1 0 1 0
[9,] 0 0 0 0 0 0 0 0 1
PS: install the lsa package and see ?cosine for detail info
============================ update =====
The resulting matrix is like,
user1 **user2** user3 **user4**
user1 1 0
user2 1
user3 ... 1
user4
where element(i,j) means the similarity between user i and user j.
and if your userid has 2 users say user 2 and user 4.
Then you want to find the similarity between these 2 users to other users.
which is a sub matrix of the entire similarity matrix.
Then use res[, c(2,4)] to obtain the desired matrix.