R Merge part of table into one column with sum - r

I have the following table in R:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
162 148 108 93 67 83 44 53 37 47 25 34 17 22 11 11 5
I want to divide in into 7 parts had title of 1 2 3 4 5 6 7&greater, where it needs to combine all the number after 7 and merge it into the last one.
I have looked at aggregate & tapply but doesn't seem like the right function I need.

x <- c(x[1:6], "7 and above"=sum(x[-(1:6)]))
1 2 3 4 5 6 7 and above
162 148 108 93 67 83 306
data
x <- table(rep(c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17), c(162,148,108,93,67,83,44,53,37,47,25,34,17,22,11,11,5)))

If you are using table to generate the output above you can use pmin to keep minimum between the values in your data and 7 and then use table to count the frequency.
Assuming your dataframe is called df and column name is col_name you can do.
tab <- table(pmin(df$col_name, 7))
The values under 7 would include all the 7 & above values together. You can rename it to make it more clear.
names(tab)[7] <- '7&above'

Related

How to write a loop for this case in R?

I have a data base with 121 rows and like 10 columns. One of these columns corresponds to Station, another to depth and the rest to chemical variables (temperature, salinity, etc.). I want to calculate the integrated value of these chemical properties by station, using the function oce::integrateTrapezoid. It's my first time doing a loop, so i dont know how. Could you help me?
dA<-matrix(data=NA, nrow=121, ncol=3)
for (Station in unique(datos$Station))
{dA[Station, cd] <- integrateTrapezoid(cd, Profundidad..m., "cA")
}
Station
Depth
temp
1
10
28
1
50
25
1
100
15
1
150
10
2
9
27
2
45
24
2
98
14
2
152
11
3
11
28.7
3
48
23
3
102
14
3
148
9

Doing Row products conditionnally to value of specific columns

I have a dataset of this form.
a=data.frame(A=1:5,B=1:5,matrix(seq(50),nrow = 5))
colnames(a)<-c("A","B", paste0(1:10))
A B 1 2 3 4 5 6 7 8 9 10
1 1 1 6 11 16 21 26 31 36 41 46
2 2 2 7 12 17 22 27 32 37 42 47
3 3 3 8 13 18 23 28 33 38 43 48
4 4 4 9 14 19 24 29 34 39 44 49
5 5 5 10 15 20 25 30 35 40 45 50
I am intending to use apply in order to do the product of rows conditionnally to the value of A and B. Let's take row 2 for instance, we have A=2 and B=2 then the code will be looking for column="2" and column="2+2" and will do the product of all the elements of the selected vectors, Result is thus equal to 7*12*17=1248.
I can do it for a row
prod(a[1,match(a$A[1],colnames(a)):match(a$A[1]+a$B[1],colnames(a))])
but can't figure a way to apply it to all the data.frame. Any help?
Here is one option with apply,specify the MARGIN as 1 to loop over the rows,, create the index to match the column names from the first two elements (A, 'B), create a sequence (:), subset the values of 'x' and get theprod`
apply(a, 1, function(x) {
i1 <- match(x[1], names(x))
i2 <- match(x[1] + x[2], names(x))
prod(x[i1:i2])
})
#[1] 6 1428 150696 17535024 2362500000

Trying to use fewer than all the factors across columns in a t test

I have two different factorial experiments. Let's say that in experiment one there is a treatment column that splits all reps into treatments 1 and 2. Then there is a second treatment where columns are split again, and a third column splitting them again. There is also a code for each treatment (8, if you're following). I need to do t tests between 2 opposing treatments.
I've tried factor, mydata and subset and get error messages each time, especially since then a t test has 80 variables in the independent variable. Here are the examples (except the factor one)
myvars <- c("SH1RUC", "SH1RC")
newdata <- mydata[myvars]
newdata <- subset(december, shadehouse=="1" & system=="open" & media=="coir")
I'd like to be able to grab either shadehouse, either system and either media for doing t tests. Otherwise I'd like to grab the name, i.e. "SH1RUC" or "SH1RC," grouped together, to run a t test.
Based on the comment, here is a sample dataset:
Dep1 Dep2 Dep3 Ind1 Ind2 Ind3
1 1 3 5 4.63 65 21
2 1 3 5 5.25 64 22
3 1 3 6 4.76 67 23
4 1 3 6 5.87 65 24
5 1 4 5 4.65 87 25
6 1 4 5 5.76 67 21
7 1 4 6 3.99 75 22
8 1 4 6 4.09 46 23
9 2 3 5 5.98 68 24
10 2 3 5 3.67 79 25
11 2 3 6 5.43 75 22
12 2 3 6 4.56 57 23
13 2 4 5 5.43 65 24
14 2 4 5 2.99 68 25
15 2 4 6 4.09 58 26
16 2 4 6 5.70 56 23
I'm trying to perform a t test between two specific dependent variable sets, for example rows 1 & 2 and rows 9 & 10, or rows 5 & 6 and rows 7 & 8. In the actual data there are 10 data points for each set, and I want to compare the means. I can't seems to group columns together effectively.

R: Creating a vector with certain values from another vector

So I have a csv file with column headers ID, Score, and Age.
So in R I have,
data <- read.csv(file.choose(), header=T)
attach(data)
I would like to create two new vectors with people's scores whos age are below 70 and above 70 years old. I thought there was a nice a quick way to do this but I cant find it any where. Thanks for any help
Example of what data looks like
ID, Score, Age
1, 20, 77
2, 32, 65
.... etc
And I am trying to make 2 vectors where it consists of all peoples scores who are younger than 70 and all peoples scores who are older than 70
Assuming data looks like this:
Score Age
1 1 29
2 5 39
3 8 40
4 3 89
5 5 31
6 6 23
7 7 75
8 3 3
9 2 23
10 6 54
.. . ..
you can use
df_old <- data[data$Age >= 70,]
df_young <- data[data$Age < 70,]
which gives you
> df_old
Score Age
4 3 89
7 7 75
11 7 97
13 3 101
16 5 89
18 5 89
19 4 96
20 3 97
21 8 75
and
> df_young
Score Age
1 1 29
2 5 39
3 8 40
5 5 31
6 6 23
8 3 3
9 2 23
10 6 54
12 4 23
14 2 23
15 4 45
17 7 53
PS: if you only want the scores and not the age, you could use this:
df_old <- data[data$Age >= 70, "Score"]
df_young <- data[data$Age < 70, "Score"]

R-Creating a New Column Based On Positions Specified by an Existing Column

I have a dataset (df) looks like this.
A B C D E Position
67 68 69 70 71 5
20 21 22 23 24 2
98 97 96 95 94 3
2 5 7 9 12 5
4 8 12 16 20 4
I am trying to create a new column (Result) where the value of result is equal to the position of the column specified in the position column for each row of the resulting column.
For example, if the row 1 of position column is 5, the Result column will have the value of 5th column of row 1.
My resulting column will look like this:
A B C D E Position Result
67 68 69 70 71 5 71
20 21 22 23 24 2 21
98 97 96 95 94 3 96
2 5 7 9 12 5 12
4 8 12 16 20 4 16
I used the following command, which does not give me what I need. It lumps all of the position column values in each row. I am unable to determine how I can get the correct result.
Any help is appreciated.
Thanks!
Use matrix indexing to extract the values:
df[cbind(1:nrow(df), df$Position)]
# [1] 71 21 96 12 16
Assign the result in the normal way:
df$Result <- df[cbind(1:nrow(df), df$Position)]

Resources