rowsums accross specific row in a matrix - r

final.marks
# raj sanga rohan rahul
#physics 45 43 44 49
#chemistry 47 45 48 47
#total 92 88 92 96
This is the matrix I have. Now I want to find the total for each subject separately across respective subject rows and add them as a new column to the above matrix as the 5th column . However my code i.e class.marks.chemistry<- rowSums(final.marks[2,]) keeps producing an error saying
Error saying
rowSums(final.marks[2, ]) :
'x' must be an array of at least two dimensions
Can you please help me solve it. I am very new to R or any form of scripting or programming background.

Do you mean this?
# Sample data
df <- read.table(text =
" raj sanga rohan rahul
physics 45 43 44 49
chemistry 47 45 48 47
total 92 88 92 96", header = T)
# Add column total with row sum
df$total <- rowSums(df);
df;
# raj sanga rohan rahul total
#physics 45 43 44 49 181
#chemistry 47 45 48 47 187
#total 92 88 92 96 368
The above also works if df is a matrix instead of a data.frame.
If you look at ?rowSums you can see that the x argument needs to be
an array of two or more dimensions, containing numeric,
complex, integer or logical values, or a numeric data frame.
So in your case we must pass the entire data.frame (or matrix) as an argument, rather than a specific column (like you did).

Another option would be to use addmargins on a matrix
addmargins(as.matrix(df), 2)
# raj sanga rohan rahul Sum
#physics 45 43 44 49 181
#chemistry 47 45 48 47 187
#total 92 88 92 96 368

Related

R - Read Matrix Data from file with free spaces

I have an map of efficiency data as matrix stored in a text file. With increasing row number data points are missing at the end of the line since there are no data available. When I try to read the data as matrix in R:
mymatrix = as.matrix(read.table(file="~/Desktop/Map_.txt"))
I get the error message that the row 3 (in this case the first row with missing values at the rows end) has not all values expected.
Is there a way to read the map file as matrix even with free spaces (no values) in some lines?
Thank you!
example:
1000 2000 3000 4000 5000
-1 85 75 65 60 58
-2 86 74 64 58 52
-3 83 78 68 59
-4 86 80 72
-5 86 81 71

R sum multiple columns with multiple row

So i have this data
10 21 22 23 23 43
20 12 26 43 23 65
21 54 64 73 25 75
My expected outcome is:
142
189
312
I tried to use:
df = data.matrix(df)
df = colSums(df)
df = as.data.frame(df)
However, the sum of values are wrong. I would like to know how to improve or correct this solution?
We can use rowSums
rowSums(df)
#[1] 142 189 312
Your data is stored as factors. You must convert it to numeric using as.numeric(as.character()).
In your situation I suggest to do:
for(i in 1:nrow(df)){
df[i,]<-as.numeric(as.character(df[i,]))
}
rowSums(df)

Display values on heatmap in R

I am working on a heatmap using heatmap.2 and would like to know if there is anyway to display the values on all heatmap positions. For example for the area representing "1" and rating I would like to display value "43", for "2" and privileges the value 51 and so on.
My sample data is as follows:
rating complaints privileges learning raises critical advance
1 43 51 30 39 61 92 45
2 63 64 51 54 63 73 47
3 71 70 68 69 76 86 48
4 61 63 45 47 54 84 35
Is this what you mean? By providing the data object as the cellnote argument, the values are printed in the heatmap.
heatmap.2(data, # cell labeling
cellnote=data,
notecex=1.0,
notecol="cyan",
na.color=par("bg"))
The answer is just for "For Cell labeling is there anyway not to display values that are 0".
cellnote=ifelse(data==0, NA, data) will work as you want.
In python when using seaborn.heatmap by simply using annot=True, all the values are displayed in the heatmap plot. See the example below:

R 2.15.0 lm() on Windows 6.1.7601 - Subsetting data frame receive error when labeling columns

Purpose: Subset a dataframe into 2 columns that are labeled with new names
for example:
Age Height
1 65 183
2 73 178
[data1[dataset1$Age>50 | dataset1$Height>140,], c("Age","Cm")]
# Error: unexpected ',' in "data1[data1$Age>50 | data1$Height>140,],"
What I've tried:
data1[dataset1$Age>50 | dataset1$Height>140,] #This doesn't organize results in columns
data1[dataset1$Age>50 | dataset1$Height>140,], c("Age","Cm") #Returns same error
I can't get the columns to be organized side-by-side with the labels in c("label1", "label2"). Thanks for your help! New to R and learning it alongside biostats.
If I got it clearly can subset function be of help
dataset1 <- data.frame(
age=c(44,77,21,55,66,90,23,54,31),
height=c(144,177,121,155,166,190,123,154,131)
)
data1 <- as.data.frame(subset(dataset1,dataset1$age>50 | dataset1$height>140))
colnames(data1) <- c("Age", "Height")
I may have missed what you were trying to do need a bit more reproducible data I think.
Nevertheless I had a go
dataset1 = data.frame(cbind((35:75),(135:175)))
colnames(dataset1) = c("Age","Height")
Age Height
35 135
36 136
37 137
38 138
39 139
40 140
41 141
42 142
43 143
44 144
and subset
data1 = dataset1[dataset1$Age>50 | dataset1$Height>140,]
colnames(data1) = c("Age","Cm")
Age Cm
41 141
42 142
43 143
44 144
45 145
46 146
47 147
48 148
49 149
50 150
My apologies if I missed what you wanted but to me it wasn't very clear.

Transpose with multiple variables and more than one metrics in R

I'm previously a SAS user - since I don't have SAS anymore I need to learn to use R for work.
The dataset has the following column:
market date sitename impression clicks
I want to transpose it into:
market date sitename-impression sitename-clicks
I think in SAS I used to do:
Proc Transpose
by market date;
id sitename;
var impression clicks;
run;
I do have a book on R and googled a lot, but couldn't find the solution that works...
Would really appreciate if anyone can help.
Thanks in advance!!!
Let me start by saying welcome to stackoverflow. Glad to have anew user. When you ask a question it's helpful and encouraged for you to provide the code you're using and a reproducible data set that looks like the original. This is called a minimal reproducible example. To get a data set into here you can use several options, here are two: use dput() around the object name and cut and paste what is displayed in the console or just post the dataframe directly. For the code provide all the code necessary to replicate your problem. I hope you find this helpful for future questions you'll ask.
I may not fully understand but I think you want to transform, not transpose, the data.
dat <- data.frame(market=rnorm(10), date=rnorm(10), #let's create a data set
sitename=rnorm(10), impression=rnorm(10), clicks=rnorm(10))
dat #look at it (I pasted it below)
# > dat
# market date sitename impression clicks
# 1 -0.9593797 -0.08411994 1.6079129 -0.5204772 -0.31633966
# 2 -0.5088689 1.78799500 -0.2469315 1.3476964 -0.04344779
# 3 -0.1527465 0.81673996 1.7824969 -1.5531260 -1.28304384
# 4 -0.7026194 0.52072913 -0.1174356 0.5722210 -1.20474443
# 5 -0.4537490 -0.69139062 1.1124277 -0.2452974 -0.33025320
# 6 0.7466588 0.36318337 -0.4623319 -0.9036768 -0.65754302
# 7 0.8007612 2.59588554 0.1820732 0.4318629 -0.36308748
# 8 1.0781715 -1.01512734 0.2297475 0.9219439 -1.15687902
# 9 0.3731450 -0.19004572 0.5190749 -1.4020371 -0.97370295
# 10 0.7724259 1.76528303 0.5781786 -0.5490849 -0.83819036
#now to create the new columns (I think this is what you want)
#the easiest way is to use transform. ?tranform for more
dat.new <- transform(dat, sitename.clicks=sitename-clicks,
impression.clicks=impression-clicks)
dat.new #here's the new data set. Notice it has the new and old columns.
#To get rid of the old columns you can use indexing and specify the columns you want.
dat.new[, c(1:2, 6:7)]
#We could have also done:
dat.new[, c(1,2,6,7)]
#or said the columns not wanted with negative indexing:
dat.new[, -c(3:5)]
EDIT In looking at Brian's comments and the variables I would think that a long to wide transformation is what the poster desires. I would likely approach it using Wickham's reshape2 package as well, as this method is easier for me to work with and I imagine it would be easier for an R beginner as well. However, here is a base way to do the long to wide format using the same data set Brian provided:
wide <- reshape(DF, v.names=c("impression", "clicks"), idvar=c("market", "date"),
timevar="sitename", direction="wide")
reshape(wide)
The reshape function is very flexible but takes some getting used to to use appropriately. I'm leaving my previous response up as well to keep the history of this post though I now believe this is not the posters intent. It serves as a reminder that a reproducible example is very helpful in providing clarity to your query.
Example data, as Tyler said, is important. I interpreted your question differently because I thought your data was different. I didn't take the - as a literal subtraction of numerics, but a combination of variables.
DF <- expand.grid(market = LETTERS[1:5],
date = Sys.Date()+(0:5),
sitename = letters[1:2])
n <- nrow(DF)
DF$impression <- sample(100, n, replace=TRUE)
DF$clicks <- sample(100, n, replace=TRUE)
I find the reshape2 package useful for these sort of transpositions/transformations/rearrangements.
library("reshape2")
dcast(melt(DF, id.vars=c("market","date","sitename")),
market+date~sitename+variable)
gives
market date a_impression a_clicks b_impression b_clicks
1 A 2012-02-28 74 97 11 71
2 A 2012-02-29 34 30 88 35
3 A 2012-03-01 40 85 40 49
4 A 2012-03-02 46 12 99 20
5 A 2012-03-03 6 95 85 56
6 A 2012-03-04 61 61 42 64
7 B 2012-02-28 4 53 74 9
8 B 2012-02-29 43 27 92 59
9 B 2012-03-01 34 26 86 43
10 B 2012-03-02 81 47 84 35
11 B 2012-03-03 3 5 91 48
12 B 2012-03-04 19 26 99 21
13 C 2012-02-28 22 31 100 53
14 C 2012-02-29 40 83 95 27
15 C 2012-03-01 78 89 81 29
16 C 2012-03-02 57 55 79 87
17 C 2012-03-03 37 61 3 97
18 C 2012-03-04 83 61 41 77
19 D 2012-02-28 81 18 47 3
20 D 2012-02-29 90 100 17 83
21 D 2012-03-01 12 40 35 93
22 D 2012-03-02 85 14 63 67
23 D 2012-03-03 63 53 29 58
24 D 2012-03-04 40 79 56 70
25 E 2012-02-28 97 62 68 31
26 E 2012-02-29 24 84 17 63
27 E 2012-03-01 94 93 32 2
28 E 2012-03-02 6 26 86 26
29 E 2012-03-03 100 34 37 80
30 E 2012-03-04 89 87 72 11
The column names have a _ between them rather than a -, but you can change that if you want. I wouldn't recommend it, though, because then you will have problems later referencing the column since the - will be taken as subtraction (you would need to quote the name).

Resources