Subsetting multiple variables in one column in r - r

I only have basic knowledge of R and i hope you can help me with my problem and its not a too stupid question for you ;-)
I have a dataset called "rope". It looks like the following :
head(rope)
X...Sound Time.real. Time.in.Video. Observations
1 5_min_blank 10:18 03:59 (2) 2
2 5_min_blank NA
3 Fisch1 10:23 08:59 6
4 Fisch1 NA
5 Fisch1 NA
6 Fisch1 NA
Observation.total.time Time.of.the.shark.in.the.video
1 60 23
2 37
3 157 17
4 46
5 37
6 28
Time.of.the.shark.entering.the.video
1 04:03
2 04:20
3 08:49
4 09:06
5 09:23
6 10:21
Time.of.the.shark.leaving.the.video
1 04:26
2 04:57
3 09:05
4 09:52
5 10:00
6 10:49
times.the.shark.turns.to.the.speaker directional.change
1 1 5
2 2 11
3 1 1
4 4 6
5 3 6
6 2 7
flap.of.the.fins..fotf. flap.of.the.fins..second corrected.fotf.s
1 14 0,608695652 0.7777778
2 14 0,378378378 0.5600000
3 0 NA
4 30 0,652173913 0.6818182
5 0 0 NA
6 15 0,535714286 0.6521739
Notes complete.cyrcles swims.below.b..above.a..speaker
1 1 NA
2 NA
3 NA
4 2 NA
5 NA
6 NA
Swimming.patterns date X
1 3 21.07.17 NA
2 9 21.07.17 NA
3 NA 21.07.17 NA
4 9 21.07.17 NA
5 4 21.07.17 NA
6 4 21.07.17 NA
Now i have different sounds. The first sound is the "Fish1" but i also have "Fish2" and "Diving" for example. Furthermore are between the sounds the corresponding pauses they are called "Fish1_pause", "Fish2_pause" or "Diving_pause" etc.
Now i would like to subset my data into the sound data points and the "pause" data points.
I tried:
sound<-subset(rope, rope$X...Sound=="Fish1"& rope$X...Sound=="Fish2")
but i got no datapoint at all... if i only type :
sound<-subset(rope, rope$X...Sound=="Fish1")
I receive all datapoints were i have the Fish1 sound.
My question now is how can i get all sound points?
Because with the "&" it didn't work... i hope you understand my problem and you can help me.
Thank you very much and all the best
Jessi

sound<-subset(rope, rope$X...Sound=="Fish1"& rope$X...Sound=="Fish2")
should be replaced by either
sound<-subset(rope, rope$X...Sound == "Fish1" | rope$X...Sound == "Fish2")
or
sound<-subset(rope, rope$X...Sound %in% c("Fish1","Fish2"))
As it is, you are asking for observations where X...Sound is simultaneously "Fish1" and "Fish2" -- which is impossible.

Related

Get the average of the values of one column for the values in another

I was not so sure how to ask this question. i am trying to answer what is the average tone when an initiative is mentioned and additionally when a topic, and a goal( or achievement) are mentioned. My dataframe (df) has many mentions of 70 initiatives (rows). meaning my df has 500+ rows of data, but only 70 Initiatives.
My data looks like this
> tabmean
Initiative Topic Goals Achievements Tone
1 52 44 2 2 2
2 294 42 2 2 2
3 103 31 2 2 2
4 52 41 2 2 2
5 87 26 2 1 1
6 52 87 2 2 2
7 136 81 2 2 2
8 19 7 2 2 1
9 19 4 2 2 2
10 0 63 2 2 2
11 0 25 2 2 2
12 19 51 2 2 2
13 52 51 2 2 2
14 108 94 2 2 1
15 52 89 2 2 2
16 110 37 2 2 2
17 247 25 2 2 2
18 66 95 2 2 2
19 24 49 2 2 2
20 24 110 2 2 2
I want to find what is the mean or average Tone when an Initiative is mentioned. as well as what is the Tone when an Initiative, a Topic and a Goal are mentioned at the same time. The code options for Tone are : positive(coded: 1), neutral(2), negative (coded:3), and both positive and negative(4). Goals and Achievements are coded yes(1) and no(2).
I have used this code:
GoalMeanTone <- tabmean %>%
group_by(Initiative,Topic,Goals,Tone) %>%
summarize(averagetone = mean(Tone))
With Solution output :
GoalMeanTone
# A tibble: 454 x 5
# Groups: Initiative, Topic, Goals [424]
Initiative Topic Goals Tone averagetone
<chr> <chr> <chr> <chr> <dbl>
1 0 104 2 0 NA
2 0 105 2 0 NA
3 0 22 2 0 NA
4 0 25 2 0 NA
5 0 29 2 0 NA
6 0 30 2 1 NA
7 0 31 1 1 NA
8 0 42 1 0 NA
9 0 44 2 0 NA
10 0 44 NA 0 NA
# ... with 444 more rows
note that for Initiative Value 0 means "other initiative".
and I've also tried this code
library(plyr)
GoalMeanTone2 <- ddply( tabmean, .(Initiative), function(x) mean(tabmean$Tone) )
with solution output
> GoalMeanTone2
Initiative V1
1 0 NA
2 1 NA
3 101 NA
4 102 NA
5 103 NA
6 104 NA
7 105 NA
8 107 NA
9 108 NA
10 110 NA
Note that in both instances, I do not get an average for Tone but instead get NA's
I have removed the NAs in the df from the column "Tone" also have tried to remove all the other mission values in the df ( its only about 30 values that i deleted).
and I have also re-coded the values for Tone :
tabmean<-Meantable %>% mutate(Tone=recode(Tone,
`1`="1",
`2`="0",
`3`="-1",
`4`="2"))
I still cannot manage to get the average tone for an initiative. Maybe the solution is more obvious than i think, but have gotten stuck and have no idea how to proceed or solve this.
i'd be super grateful for a better code to get this. Thanks!
I'm not completely sure what you mean by 'the average tone when an initiative is mentioned', but let's say that you'd want to get the average tone for when initiative=1, you could try the following:
tabmean %>% filter(initiative==1) %>% summarise(avg_tone=mean(tone, na.rm=TRUE)
Note that (1) you have to add na.rm==TRUE to the summarise call if you have missing values in the column that you are summarizing, otherwise it will only produce NA's, and (2) check that the columns are of type numeric (you could check that with str(tabmean) and for example change tone to numeric with tabmean <- tabmean %>% mutate(tone=as.numeric(tone)).

Error with repolr: Error in rowSums(t(mapply(complete.cases, split.data))) : 'x' must be numeric

I am trying to analyse a dataset in R.
The data set looks something like this:
ID visit REL_3 LOAN ee dp pa exer alcohol e p d
1 2 1 4 2 44 12 32 122.0 8 2 0 2
2 2 2 4 2 44 48 75 78.5 8 2 2 2
3 2 3 4 1 26 17 49 222.5 8 1 2 2
4 2 4 NA NA NA NA NA NA NA NA NA NA
5 3 1 4 6 27 13 48 78.0 44 2 2 2
6 3 2 4 6 46 13 37 49.0 38 2 1 2
Except for ID and visit, all the variables are numeric.
However when I try to fit:
repolr(e~REL_3LOANalcohol*exer, data=mss, categories=3,subjects = "ID",times = c(1,2,3,4),corr.mod = "ar1",alpha=0.5)
I get Error
Error in rowSums(t(mapply(complete.cases, split.data))) : 'x' must be numeric
I would appreciate some help on this if possible. I had to add NAs because the repolr package requires entries for all the patients for all the visits even if there is no data for that particular visit and subject.
I dont know how to proceed. I would really appreciate some help on this.
Regards,
Shalom

Aggregation of all possible unique combinations with observations in the same column in R

I am trying to shorten a chunk of code to make it faster and easier to modify. This is a short example of my data.
order obs year var1 var2 var3
1 3 1 1 32 588 NA
2 4 1 2 33 689 2385
3 5 1 3 NA 678 2369
4 33 3 1 10 214 1274
5 34 3 2 10 237 1345
6 35 3 3 10 242 1393
7 78 6 1 5 62 NA
8 79 6 2 5 75 296
9 80 6 3 5 76 500
10 93 7 1 NA NA NA
11 94 7 2 4 86 247
12 95 7 3 3 54 207
Basically, what I want is R to find any possible and unique combination of two values (observations) in column "obs", within the same year, to create a new matrix or DF with observations being the aggregation of the originals. Order is not important, so 1+6 = 6+1. For instance, having 150 observations, I will expect 11,175 feasible combinations (each year).
I sort of got what I want with basic coding but, as you will see, is way too long (I have built this way 66 different new data sets so it does not really make a sense) and I am wondering how to shorten it. I did some trials (plyr,...) with no real success. Here what I did:
# For the 1st year, groups of 2 obs
newmatrix <- data.frame(t(combn(unique(data$obs[data$year==1]), 2)))
colnames(newmatrix) <- c("obs1", "obs2")
newmatrix$name <- do.call(paste, c(newmatrix[c("obs1", "obs2")], sep = "_"))
# and the aggregation of var. using indexes, which I will skip here to save your time :)
To ilustrate, here the result, considering above sample, of what I would get for the 1st year. NA is because I only computed those where the 2 values were valid. And only for variables 1 and 3. More, I did the sum but it could be any other possible Function:
order obs1 obs2 year var1 var3
1 1 1 3 1_3 42 NA
2 2 1 6 1_6 37 NA
3 3 1 7 1_7 NA NA
4 4 3 6 3_6 15 NA
5 5 3 7 3_7 NA NA
6 6 6 7 6_7 NA NA
As for the 2 first lines in the 3rd year, same type of matrix:
order obs1 obs2 year var1 var3
1 1 1 3 1_3 NA 3762
2 2 1 6 1_6 NA 2868
.......... etc ............
I hope I explained myself. Thank you in advance for your hints on how to do this more efficient.
I would use split-apply-combine to split by year, find all the combinations, and then combine back together:
do.call(rbind, lapply(split(data, data$year), function(x) {
p <- combn(nrow(x), 2)
data.frame(order=paste(x$order[p[1,]], x$order[p[2,]], sep="_"),
obs1=x$obs[p[1,]],
obs2=x$obs[p[2,]],
year=x$year[1],
var1=x$var1[p[1,]] + x$var1[p[2,]],
var2=x$var2[p[1,]] + x$var2[p[2,]],
var3=x$var3[p[1,]] + x$var3[p[2,]])
}))
# order obs1 obs2 year var1 var2 var3
# 1.1 3_33 1 3 1 42 802 NA
# 1.2 3_78 1 6 1 37 650 NA
# 1.3 3_93 1 7 1 NA NA NA
# 1.4 33_78 3 6 1 15 276 NA
# 1.5 33_93 3 7 1 NA NA NA
# 1.6 78_93 6 7 1 NA NA NA
# 2.1 4_34 1 3 2 43 926 3730
# 2.2 4_79 1 6 2 38 764 2681
# 2.3 4_94 1 7 2 37 775 2632
# 2.4 34_79 3 6 2 15 312 1641
# 2.5 34_94 3 7 2 14 323 1592
# 2.6 79_94 6 7 2 9 161 543
# 3.1 5_35 1 3 3 NA 920 3762
# 3.2 5_80 1 6 3 NA 754 2869
# 3.3 5_95 1 7 3 NA 732 2576
# 3.4 35_80 3 6 3 15 318 1893
# 3.5 35_95 3 7 3 13 296 1600
# 3.6 80_95 6 7 3 8 130 707
This enables you to be very flexible in how you combine data pairs of observations within a year --- x[p[1,],] represents the year-specific data for the first element in each pair and x[p[2,],] represents the year-specific data for the second element in each pair. You can return a year-specific data frame with any combination of data for the pairs, and the year-specific data frames are combined into a single final data frame with do.call and rbind.

R: Positioning labels and axes with rgl.plot3d

I'm trying to create a 3d scatter plot using rgl.plot3d. However, the default positioning of the labels and axes is not satisfactory. E.g., the y-axis label is positioned on the far side, while I want it to be positioned on the near side. The x-axis ticks are positioned at the far top. I went them to be positioned at the near bottom. I looked at ?par3dbut couldn't find anything that would help me. Is it possible to do this in rgl? Code and data are given below. Thank you.
Code
d <- read.table(file='myfile.dat', header=F)
plot3d(
d,
xlim=c(0,20),
ylim=c(0,20),
zlim=c(0,10000),
box=F,
type='p',
size=5,
col=d[,1]
)
mtext3d(text='Test', edge='y+-', line=2)
axes3d(
edges=c('x--', 'y+-', 'z--'),
labels=T
)
lines3d(
d,
lwd=2,
col=d[,1]
)
grid3d(side=c('x', 'y+', 'z'))
Data
11 2 2
NA NA NA
10 2 2
NA NA NA
13 2 1
NA NA NA
15 2 1
NA NA NA
5 2 11
5 3 10
5 4 16
5 5 34
5 6 102
5 7 294
5 8 682
5 9 1439
5 10 2646
5 11 3615
5 12 2844
5 13 1394
NA NA NA
4 2 10
4 3 4
4 4 4
4 5 10
4 6 38
4 7 132
4 8 396
4 9 976
4 10 2121
4 11 4085
4 12 6261
4 13 6459
4 14 4238
4 15 1394
NA NA NA
7 2 3
NA NA NA
6 2 2
NA NA NA
9 2 8
9 3 6
9 4 4
9 5 5
NA NA NA
8 2 4
8 3 10
8 4 22
8 5 52
8 6 126
8 7 264
8 8 478
8 9 729
8 10 943
8 11 754
8 12 382
NA NA NA
You need to look at ?axis3d where the use of the 'edges' parameter is described. If you want the x-axis tick labels at the front-bottom and the y-axis on the near+bottom side, you would first build the plot using ..., axes=FALSE, and with the focus unchanged issue this command at the console:
axes3d( edges=c("x--", "y--", "z") )
I have not yet figured out whether it is possible to remove an existing axis in an rgl plot.

timeSeries align not finishing its job

I'm using the timeSeries package, and especially the align function. My data are spurious and I want to fill the NAs by propagating the last available value. But it seems that align() doesn't go until the end of the sample if it finishes with an NA.
An example: I have a non-aligned time series
> notAligned
GMT
TS.1 TS.2 TS.3 TS.4
2011-02-03 NA 1 4 8
2011-02-04 1 NA 2 NA
2011-02-07 5 6 NA NA
2011-02-08 NA 2 NA 9
If I use the align function, it returns this
> align(notAligned)
GMT
TS.1 TS.2 TS.3 TS.4
2011-02-03 NA 1 4 8
2011-02-04 1 1 2 8
2011-02-07 5 6 NA 8
2011-02-08 NA 2 NA 9
It correctly fills TS.2 on the 4th and TS.4 on the 4th and 7th, but doesn't fill TS.1 on the 8th with 5, or TS.3 on the 7th and 8th with 2. I would expect align to fill them...
Did I misunderstand the function? Is there a way to work around this?
Thanks for your help
I have no idea why timeSeries::align doesn't work, but I would just use zoo::na.locf:
na.locf(notAligned, na.rm=FALSE)
# GMT
# TS.1 TS.2 TS.3 TS.4
# 2011-02-03 NA 1 4 8
# 2011-02-04 1 1 2 8
# 2011-02-07 5 6 2 8
# 2011-02-08 5 2 2 9

Resources