Add variable to melt function - r

Here is my input, question is below:
Targets <- read.csv("miR155Aicda.csv", row.names=1, sep="", header=T)
head(Targets)
T0h T0.25h T0.5h T1h T2h T3h T6h T12h T24h T48h C0h C0.25h C0.5h C1h C2h
Aicda 785 1150 707 513 1265 3268 8294 8625 7387 4397 677 911 673 737 1782
mmu-miR-155-3p 622 548 558 1213 1195 1172 1115 1883 3257 1900 499 562 584 543 580
Targets.m <- melt(Targets)
> head(Targets.m)
variable value
1 T0h 9.616549
2 T0h 9.280771
3 T0.25h 10.167418
4 T0.25h 9.098032
5 T0.5h 9.465566
6 T0.5h 9.124121
Question: How do I add Aicda and mmu-mIR-155-2p as a variable?
I want this:
ID variable value
1 Aicda T0h 9.616549
2 mmu-miR-155-3p T0h 9.280771
3 Aicda T0.25h 10.167418
4 mmu-miR-155-3p T0.25h 9.098032
5 Aicda T0.5h 9.465566
6 mmu-miR-155-3p T0.5h 9.124121

You need to put the rownames in a column and then specify an id variable for melt:
DF <- read.table(text=" T0h T0.25h T0.5h T1h T2h T3h T6h T12h T24h T48h C0h C0.25h C0.5h C1h C2h
Aicda 785 1150 707 513 1265 3268 8294 8625 7387 4397 677 911 673 737 1782
mmu-miR-155-3p 622 548 558 1213 1195 1172 1115 1883 3257 1900 499 562 584 543 580",header=TRUE)
DF$ID <- rownames(DF)
library(reshape2)
melt(DF,id="ID")

Related

R: Different Ways to Name Matrices

I am working with the R programming language.
In the "datasets" library in R, there is a data set called "eurodist" that contains the distance between each combination of cities :
library(datasets)
This data set can be then converted into a "matrix":
eurodist = as.matrix(eurodist)
Athens Barcelona Brussels Calais Cherbourg Cologne Copenhagen Geneva Gibraltar Hamburg Hook of Holland Lisbon Lyons Madrid Marseilles Milan Munich Paris Rome Stockholm Vienna
Athens 0 3313 2963 3175 3339 2762 3276 2610 4485 2977 3030 4532 2753 3949 2865 2282 2179 3000 817 3927 1991
Barcelona 3313 0 1318 1326 1294 1498 2218 803 1172 2018 1490 1305 645 636 521 1014 1365 1033 1460 2868 1802
Brussels 2963 1318 0 204 583 206 966 677 2256 597 172 2084 690 1558 1011 925 747 285 1511 1616 1175
Calais 3175 1326 204 0 460 409 1136 747 2224 714 330 2052 739 1550 1059 1077 977 280 1662 1786 1381
Cherbourg 3339 1294 583 460 0 785 1545 853 2047 1115 731 1827 789 1347 1101 1209 1160 340 1794 2196 1588
Cologne 2762 1498 206 409 785 0 760 1662 2436 460 269 2290 714 1764 1035 911 583 465 1497 1403 937
Copenhagen 3276 2218 966 1136 1545 760 0 1418 3196 460 269 2971 1458 2498 1778 1537 1104 1176 2050 650 1455
Geneva 2610 803 677 747 853 1662 1418 0 1975 1118 895 1936 158 1439 425 328 591 513 995 2068 1019
Gibraltar 4485 1172 2256 2224 2047 2436 3196 1975 0 2897 2428 676 1817 698 1693 2185 2565 1971 2631 3886 2974
Hamburg 2977 2018 597 714 1115 460 460 1118 2897 0 550 2671 1159 2198 1479 1238 805 877 1751 949 1155
Hook of Holland 3030 1490 172 330 731 269 269 895 2428 550 0 2280 863 1730 1183 1098 851 457 1683 1500 1205
Lisbon 4532 1305 2084 2052 1827 2290 2971 1936 676 2671 2280 0 1178 668 1762 2250 2507 1799 2700 3231 2937
Lyons 2753 645 690 739 789 714 1458 158 1817 1159 863 1178 0 1281 320 328 724 471 1048 2108 1157
Madrid 3949 636 1558 1550 1347 1764 2498 1439 698 2198 1730 668 1281 0 1157 1724 2010 1273 2097 3188 2409
Marseilles 2865 521 1011 1059 1101 1035 1778 425 1693 1479 1183 1762 320 1157 0 618 1109 792 1011 2428 1363
Milan 2282 1014 925 1077 1209 911 1537 328 2185 1238 1098 2250 328 1724 618 0 331 856 586 2187 898
Munich 2179 1365 747 977 1160 583 1104 591 2565 805 851 2507 724 2010 1109 331 0 821 946 1754 428
Paris 3000 1033 285 280 340 465 1176 513 1971 877 457 1799 471 1273 792 856 821 0 1476 1827 1249
Rome 817 1460 1511 1662 1794 1497 2050 995 2631 1751 1683 2700 1048 2097 1011 586 946 1476 0 2707 1209
Stockholm 3927 2868 1616 1786 2196 1403 650 2068 3886 949 1500 3231 2108 3188 2428 2187 1754 1827 2707 0 2105
Vienna 1991 1802 1175 1381 1588 937 1455 1019 2974 1155 1205 2937 1157 2409 1363 898 428 1249 1209 2105 0
My Question: Suppose I have 6 cities and the Longitude/Latitude for each of these cities :
data_1 = data.frame(id = c(1,2,3), long = rnorm(3, -74, 1 ), lat = rnorm(3, 40, 1 ))
data_2 = data.frame(id = c(4,5,6), long = rnorm(3, -78, 1 ), lat = rnorm(3, 42, 1 ))
final_data = rbind(data_1, data_2)
final_data$names <- c("city_1", "city_2", "city_3", "city_4", "city_5", "city_6")
id long lat names
1 1 -75.28447 40.21079 city_1
2 2 -73.29385 40.09104 city_2
3 3 -75.12737 38.88355 city_3
4 4 -79.42325 42.61917 city_4
5 5 -77.82508 41.11707 city_5
6 6 -77.62831 39.94935 city_6
I can also make a similar matrix for these cities that contains the distance between each pair of cities:
library(geosphere)
N <- nrow(final_data)
dists <- outer(seq_len(N), seq_len(N), function(a,b) {
geosphere::distHaversine(final_data[a,2:3], final_data[b,2:3]) # Notes 1, 2
})
D <- as.matrix(dists)
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] 0.0 169895.7 148361.1 437239.3 237056.7 201742.0
[2,] 169895.7 0.0 207068.8 584183.9 399577.9 369814.4
[3,] 148361.1 207068.8 0.0 551356.0 338698.3 245620.3
[4,] 437239.3 584183.9 551356.0 0.0 213326.6 332955.7
[5,] 237056.7 399577.9 338698.3 213326.6 0.0 131051.7
[6,] 201742.0 369814.4 245620.3 332955.7 131051.7 0.0
How can I make my matrix look the same way as the "eurodist" matrix?
I had thought of the following way to do this:
colnames(dists) <- c("city_1", "city_2", "city_3", "city_4", "city_5", "city_6")
rownames(dists) <- c("city_1", "city_2", "city_3", "city_4", "city_5", "city_6")
city_1 city_2 city_3 city_4 city_5 city_6
city_1 0.0 169895.7 148361.1 437239.3 237056.7 201742.0
city_2 169895.7 0.0 207068.8 584183.9 399577.9 369814.4
city_3 148361.1 207068.8 0.0 551356.0 338698.3 245620.3
city_4 437239.3 584183.9 551356.0 0.0 213326.6 332955.7
city_5 237056.7 399577.9 338698.3 213326.6 0.0 131051.7
city_6 201742.0 369814.4 245620.3 332955.7 131051.7 0.0
In the end, I would like to use the above matrix as input for a customized Travelling Salesman Problem (R: Customizing the Travelling Salesman Problem) - e.g. Try to find the optimal path when you are forced to start at "city 4" and the third city should be "city 5":
D <- dists
transformMatrix <- function(fixed_points, D){
if(length(fixed_points) == 0) return(D)
p <- integer(nrow(D))
pos <- match(names(fixed_points), colnames(D))
p[fixed_points] <- pos
p[-fixed_points] <- sample(setdiff(seq_len(nrow(D)), pos))
D[p, p]
}
fixed_points <- c(
"city_4" = 1, "city_5" = 3
)
D_perm <- transformMatrix(fixed_points, D)
feasiblePopulation <- function(n, size, fixed_points){
positions <- setdiff(seq_len(n), fixed_points)
m <- matrix(0, size, n)
if(length(fixed_points) > 0){
m[, fixed_points] <- rep(fixed_points, each = size)
for(i in seq_len(size))
m[i, -fixed_points] <- sample(positions)
} else {
for(i in seq_len(size))
m[i,] <- sample(positions)
}
m
}
mutation <- function(n, fixed_points){
positions <- setdiff(seq_len(n), fixed_points)
function(obj, parent){
vec <- obj#population[parent,]
if(length(positions) < 2) return(vec)
indices <- sample(positions, 2)
replace(vec, indices, vec[rev(indices)])
}
}
fitness <- function(tour, distMatrix) {
tour <- c(tour, tour[1])
route <- embed(tour, 2)[,2:1]
1/sum(distMatrix[route])
}
popSize = 500
res <- ga(
type = "permutation",
fitness = fitness,
distMatrix = D_perm,
lower = 1,
upper = nrow(D_perm),
mutation = mutation(nrow(D_perm), fixed_points),
crossover = gaperm_pmxCrossover,
suggestions = feasiblePopulation(nrow(D_perm), popSize, fixed_points),
popSize = popSize,
maxiter = 5000,
run = 500,
pmutation = 0.2
)
colnames(D_perm)[res#solution[1,]]
This results in the following error:
Error in if (object#run >= run) break :
missing value where TRUE/FALSE needed
In addition: Warning messages:
1: In max(fitness) : no non-missing arguments to max; returning -Inf
2: In max(Fitness, na.rm = TRUE) :
no non-missing arguments to max; returning -Inf
3: In max(fitness) : no non-missing arguments to max; returning -Inf
4: In max(x, na.rm = TRUE) :
no non-missing arguments to max; returning -Inf
Is the above error because I have not made "distance matrix" (i.e. "D") properly? Is there a different way to name the columns and rows of a matrix in R?
Thanks!
Note : If anyone knows another way to solve this constraint Travelling Salesman Problem with custom cities using the Genetic Algorithm in R (e.g. different objective function, different way to specify constraints, etc.), please let me know. I am open to different ways to solving this problem!
That’s not the problem. The error says the it encountered code:
if (object#run >= run) break
… and either object#run or run had length 0 which the if function cannot handle gracefully. It may be an error in the ga function itself or in the arguments to it.
To address the direct question about how to make the distance matrix look like the example in eurodist: There is a dimnames attribute for matrices. You need to assign a list with a rownames and a colnames value in it and assign that list to the dimnames attribute.
dimnames(D) <- list(rownames=final_data$names,
colnames=final_data$names)
Then when you run your code you get an error from the ga(...) call:
Error in gaperm_pmxCrossover_Rcpp(object, parents) : index error
Looking at the problem setup, your population size appears much larger than needed. If you drop it down a bit to say 100 or 200, then the results begin to be computed.
popSize=200;
# now calculate a res
colnames(D_perm)[res#solution[1,]]
#[1] "city_4" "city_6" "city_5" "city_1" "city_3" "city_2"
popSize=100
colnames(D_perm)[res#solution[1,]]
#[1] "city_4" "city_6" "city_5" "city_1" "city_3" "city_2"
popSiz=20
colnames(D_perm)[res#solution[1,]]
#[1] "city_4" "city_6" "city_5" "city_1" "city_3" "city_2"
It doesn't seem "proper" that a population size larger than needed should cause an obscure error, so you might contact the package maintainer with your example (now that it has been "dressed up" properly.)

Filter max values of each column and output as a df Using R [duplicate]

This question already has answers here:
How to find the highest value of a column in a data frame in R?
(10 answers)
Closed 2 years ago.
I have a data frame as below:
Nucleotides pos_1 pos_2 pos_3 pos_4 pos_5 pos_6 pos_7 pos_8 pos_9 pos_10 pos_11 pos_12 pos_13 pos_14 pos_15
1 A 839 1344 1151 1047 1145 770 1185 1048 782 1326 806 897 895 961 960
2 C 410 511 834 1072 688 695 836 884 643 865 853 1025 697 719 790
3 G 147 1313 955 1074 1262 1131 880 873 749 746 1260 751 771 1136 1219
4 T 2573 801 1029 776 874 1373 1068 1164 1795 1032 1050 1296 1606 1153 1000
pos_16 pos_17 pos_18 pos_19 pos_20
1 1183 802 955 712 568
2 1017 843 582 429 581
3 712 722 925 1399 1779
4 1057 1602 1507 1429 1041
I want to have the maximum value of each column as a final data frame. So basically the final df should have a single row (regardless of Nucleotides) which includes max values for each column.
Easy with apply().
data("mtcars")
df = data.frame( var = names(mtcars), max_values = apply(mtcars, 2, max), row.names = NULL )
> df
var max_values
1 mpg 33.900
2 cyl 8.000
3 disp 472.000
4 hp 335.000
5 drat 4.930
6 wt 5.424
7 qsec 22.900
8 vs 1.000
9 am 1.000
10 gear 5.000
11 carb 8.000
make sure you use apply() in conjunction with max() on numeric columns only.

Converting SAS Proc Shewhart into R programming

I have a data, where SAS Proc shewhart is implemented, I want to implement the same in R program, below is the data and sas code
> valueid date dis_id sales_amount yymm (year month)
> 868 5-Mar-18 2 956 1803
868 6-Apr-17 2 473 1704
868 22-Dec-16 2 524 1612
914 17-Dec-15 2 1768 1512
914 18-Aug-16 2 477 1608
914 12-Jan-17 2 804 1701
870 1-May-17 2 1373 1705
870 8-Sep-17 2 323 1709
870 29-Feb-16 2 1718 1602
870 26-Jan-16 2 1242 1601
870 1-Apr-16 2 995 1604
800 22-Apr-16 2 356 1604
925 10-May-16 2 1487 1605
928 30-May-16 2 1210 1605
928 29-Jun-16 2 1935 1606
928 28-Nov-16 2 1149 1611
928 13-Dec-16 2 835 1612
987 10-Jul-17 2 1023 1707
987 27-Jul-17 2 389 1707
987 22-Sep-17 2 1191 1709
Below is the Program use to implement XSCHART
proc shewhart data=sales_revenue;
by valueid;
xschart sales_amount*yymm/ nochart outtable= newoutput;
id dis_id;
run;
I need to convert this shewhart with xschart into R. kindly help me.

Rolling function with conditions

The objective is to calculate the Cum.Buy variable that is rolling sum of the Buy.Volume for all rows where the Volume.Cumsum is up to 10000 lass than the current value. For example, for Row 622, Cum.Buy will sum the Buy.Volume of the current Row (6000) and the Buy.Volume of all Rows where the Volume.Cumsum is between 9973 and (9973-10000=-27), which includes Row 566. Do, Cum.Buy equals 6000+2718. Secondly, using the same parameters I was looking to get the initial price.
df <- read.table(text="
Row Price Volume Volume.Cumsum Buy.Volume
550 6.249 151 151 0
554 6.249 1104 1255 0
566 6.250 2718 3973 2718
622 6.250 6000 9973 6000
652 6.249 745 10718 0
653 6.249 1292 12010 0
661 6.250 5918 17928 5918
663 6.250 13130 31058 13130
664 6.250 4894 35952 4894
673 6.250 4894 40846 4894", header=T)
I have tried the function, but am getting a binary operator error:
BV <- function(x) {
if (is.null(dim(x))) x <- t(x)
tt <- x[, "Volume.Cumsum"]
sum(x[tt >= tail(tt, 1) - 10000, "Buy.Volume"])
}
df = transform(df,Cum.Buy=rollapplyr(df,100,BV,by.column=FALSE,partial=TRUE)
The resulting output should look like:
df <- read.table(text="
Row Price Volume Volume.Cumsum Buy.Volume Cum.Buy Price.Lag
550 6.249 151 151 0 0 6.249
554 6.249 1104 1255 0 0 6.249
566 6.250 2718 3973 2718 2718 6.249
622 6.250 6000 9973 6000 8718 6.249
652 6.249 745 10718 0 8718 6.249
653 6.249 1292 12010 0 8718 6.250
661 6.250 5918 17928 5918 11918 6.250
663 6.251 13130 31058 13130 13130 6.251
664 6.250 4894 35952 4894 18024 6.251
673 6.250 4894 40846 4894 22918 6.251", header=T)

Barplot using three columns

The data in the table is given below:
Year NSW Vic. Qld SA WA Tas. NT ACT Aust.
1 1917 1904 1409 683 440 306 193 5 3 4941
2 1927 2402 1727 873 565 392 211 4 8 6182
3 1937 2693 1853 993 589 457 233 6 11 6836
4 1947 2985 2055 1106 646 502 257 11 17 7579
5 1957 3625 2656 1413 873 688 326 21 38 9640
6 1967 4295 3274 1700 1110 879 375 62 103 11799
7 1977 5002 3837 2130 1286 1204 415 104 214 14192
8 1987 5617 4210 2675 1393 1496 449 158 265 16264
9 1997 6274 4605 3401 1480 1798 474 187 310 18532
I want to plot a graph with (Year) on my x-axis and (total value) on my Y-axis. The barplot should depicting the ACT and NT value for the respective (Years).
I tried the following command:
barplot(as.matrix(r_data$ACT, r_data$NT), main="r_data", ylab="Total", beside=TRUE)
The above command showed the barplot of ACT column per year but didn't show the Bar plot of NT column.
You have to create the matrix in a different way:
barplot(as.matrix(r_data[c("ACT", "NT")]),
main="r_data", ylab="Total", beside=TRUE)
You can also use cbind instead of as.matrix and keep the rest of your original approach:
barplot(cbind(r_data$ACT, r_data$NT),
main="r_data", ylab="Total", beside=TRUE)

Resources