R: xlim, ylim and zlim not working for rgl.plot3d - r

I'm trying to create a 3d scatter plot using the following script:
d <- read.table(file='myfile.dat', header=F)
plot3d(
d,
xlim=c(0,20),
ylim=c(0,20),
zlim=c(0,10000),
xlab='Frequency',
ylab='Size',
zlab='Number of subgraphs',
box=F,
type='s',
size=0.5,
col=d[,1]
)
lines3d(
d,
xlim=c(2,20),
ylim=c(0,20),
zlim=c(0,10000),
lwd=2,
col=d[,1]
)
grid3d(side=c('x', 'y+', 'z'))
Now for some reason, R is ignoring the range limits I've specified and is using arbitrary values, messing up my plot. I get no error when I run the script. Does anybody have any idea what's wrong? If required, I can also post an image of the plot that is created. The data file is given below:
myfile.dat
11 2 2
NA NA NA
10 2 2
NA NA NA
13 2 1
NA NA NA
15 2 1
NA NA NA
5 2 11
5 3 10
5 4 16
5 5 34
5 6 102
5 7 294
5 8 682
5 9 1439
5 10 2646
5 11 3615
5 12 2844
5 13 1394
NA NA NA
4 2 10
4 3 4
4 4 4
4 5 10
4 6 38
4 7 132
4 8 396
4 9 976
4 10 2121
4 11 4085
4 12 6261
4 13 6459
4 14 4238
4 15 1394
NA NA NA
7 2 3
NA NA NA
6 2 2
NA NA NA
9 2 8
9 3 6
9 4 4
9 5 5
NA NA NA
8 2 4
8 3 10
8 4 22
8 5 52
8 6 126
8 7 264
8 8 478
8 9 729
8 10 943
8 11 754
8 12 382
NA NA NA

The help page, ?plot3d says "Note that since rgl does not currently support clipping, all points will be plotted, and 'xlim', 'ylim', and 'zlim' will only be used to increase the respective ranges." So you need to restrict the data in the input stage. (And you will need to use segments3d instead of lines3d if you only want particular ranges that are inside the plotted volume.)
d2 <- subset(d, d[,1]>0 & d[,1] <20 & d[,2]>0 & d[,2] <20 & d[,3]>0 & d[,3]<10000 ])
plot3d(
d2[, 1:3], # You can probably use something more meaningful,
xlim=c(0,20),
ylim=c(0,20),
zlim=c(0,10000),
xlab='Frequency',
ylab='Size',
zlab='Number of subgraphs',
box=F,
type='s',
size=0.5,
col=d[,1]
)
(I did notice that when the range was c(0,10000) that the size of the points was pretty much invisible. and further experimentation suggest that the great disparity in ranges is going to cause furhter difficulties in keeping the ranges at 0 on the low side if you increase the size to the point where it is visible. If you make the points really big , they expand the range to accommodate the overlap beyond the x=0 or y=0 planes.)

As DWin said, lines3d does not handle *lim arguments. From the help page, "... Material properties (see rgl.material), normals and texture coordinates (see rgl.primitive)."
So use some other function, or perhaps you could retrieve the existing limits from your plot3d call and use those to scale your data prior to plotting?

Related

How to treat with empty values contained in columns of data-set using r programming language?

I have learned imputation of NA values in r, we normally find the average (if it is numeric) of the data and put that in NA place of particular column. But i wanna ask that what should i do if instead of NA, the place is empty i.e. the cell is empty of any column.
Please help me.
Let's start with some test data:
person_id <- c("1","2","3","4","5","6","7","8","9","10")
inches <- as.numeric(c("56","58","60","62","64","","68","70","72","74"))
height <- data.frame(person_id,inches)
height
person_id inches
1 1 56
2 2 58
3 3 60
4 4 62
5 5 64
6 6 NA
7 7 68
8 8 70
9 9 72
10 10 74
The blank was already replaced with NA in height$inches.
You could also do this yourself:
height$inches[height$inches==""] <- NA
Now to fill in the NA with the average from the non-missing values of inches.
options(digits=4)
height$inches[is.na(height$inches)] <- mean(height$inches,na.rm=T)
height
person_id inches
1 1 56.00
2 2 58.00
3 3 60.00
4 4 62.00
5 5 64.00
6 6 64.89
7 7 68.00
8 8 70.00
9 9 72.00
10 10 74.00

Subsetting multiple variables in one column in r

I only have basic knowledge of R and i hope you can help me with my problem and its not a too stupid question for you ;-)
I have a dataset called "rope". It looks like the following :
head(rope)
X...Sound Time.real. Time.in.Video. Observations
1 5_min_blank 10:18 03:59 (2) 2
2 5_min_blank NA
3 Fisch1 10:23 08:59 6
4 Fisch1 NA
5 Fisch1 NA
6 Fisch1 NA
Observation.total.time Time.of.the.shark.in.the.video
1 60 23
2 37
3 157 17
4 46
5 37
6 28
Time.of.the.shark.entering.the.video
1 04:03
2 04:20
3 08:49
4 09:06
5 09:23
6 10:21
Time.of.the.shark.leaving.the.video
1 04:26
2 04:57
3 09:05
4 09:52
5 10:00
6 10:49
times.the.shark.turns.to.the.speaker directional.change
1 1 5
2 2 11
3 1 1
4 4 6
5 3 6
6 2 7
flap.of.the.fins..fotf. flap.of.the.fins..second corrected.fotf.s
1 14 0,608695652 0.7777778
2 14 0,378378378 0.5600000
3 0 NA
4 30 0,652173913 0.6818182
5 0 0 NA
6 15 0,535714286 0.6521739
Notes complete.cyrcles swims.below.b..above.a..speaker
1 1 NA
2 NA
3 NA
4 2 NA
5 NA
6 NA
Swimming.patterns date X
1 3 21.07.17 NA
2 9 21.07.17 NA
3 NA 21.07.17 NA
4 9 21.07.17 NA
5 4 21.07.17 NA
6 4 21.07.17 NA
Now i have different sounds. The first sound is the "Fish1" but i also have "Fish2" and "Diving" for example. Furthermore are between the sounds the corresponding pauses they are called "Fish1_pause", "Fish2_pause" or "Diving_pause" etc.
Now i would like to subset my data into the sound data points and the "pause" data points.
I tried:
sound<-subset(rope, rope$X...Sound=="Fish1"& rope$X...Sound=="Fish2")
but i got no datapoint at all... if i only type :
sound<-subset(rope, rope$X...Sound=="Fish1")
I receive all datapoints were i have the Fish1 sound.
My question now is how can i get all sound points?
Because with the "&" it didn't work... i hope you understand my problem and you can help me.
Thank you very much and all the best
Jessi
sound<-subset(rope, rope$X...Sound=="Fish1"& rope$X...Sound=="Fish2")
should be replaced by either
sound<-subset(rope, rope$X...Sound == "Fish1" | rope$X...Sound == "Fish2")
or
sound<-subset(rope, rope$X...Sound %in% c("Fish1","Fish2"))
As it is, you are asking for observations where X...Sound is simultaneously "Fish1" and "Fish2" -- which is impossible.

Aggregation of all possible unique combinations with observations in the same column in R

I am trying to shorten a chunk of code to make it faster and easier to modify. This is a short example of my data.
order obs year var1 var2 var3
1 3 1 1 32 588 NA
2 4 1 2 33 689 2385
3 5 1 3 NA 678 2369
4 33 3 1 10 214 1274
5 34 3 2 10 237 1345
6 35 3 3 10 242 1393
7 78 6 1 5 62 NA
8 79 6 2 5 75 296
9 80 6 3 5 76 500
10 93 7 1 NA NA NA
11 94 7 2 4 86 247
12 95 7 3 3 54 207
Basically, what I want is R to find any possible and unique combination of two values (observations) in column "obs", within the same year, to create a new matrix or DF with observations being the aggregation of the originals. Order is not important, so 1+6 = 6+1. For instance, having 150 observations, I will expect 11,175 feasible combinations (each year).
I sort of got what I want with basic coding but, as you will see, is way too long (I have built this way 66 different new data sets so it does not really make a sense) and I am wondering how to shorten it. I did some trials (plyr,...) with no real success. Here what I did:
# For the 1st year, groups of 2 obs
newmatrix <- data.frame(t(combn(unique(data$obs[data$year==1]), 2)))
colnames(newmatrix) <- c("obs1", "obs2")
newmatrix$name <- do.call(paste, c(newmatrix[c("obs1", "obs2")], sep = "_"))
# and the aggregation of var. using indexes, which I will skip here to save your time :)
To ilustrate, here the result, considering above sample, of what I would get for the 1st year. NA is because I only computed those where the 2 values were valid. And only for variables 1 and 3. More, I did the sum but it could be any other possible Function:
order obs1 obs2 year var1 var3
1 1 1 3 1_3 42 NA
2 2 1 6 1_6 37 NA
3 3 1 7 1_7 NA NA
4 4 3 6 3_6 15 NA
5 5 3 7 3_7 NA NA
6 6 6 7 6_7 NA NA
As for the 2 first lines in the 3rd year, same type of matrix:
order obs1 obs2 year var1 var3
1 1 1 3 1_3 NA 3762
2 2 1 6 1_6 NA 2868
.......... etc ............
I hope I explained myself. Thank you in advance for your hints on how to do this more efficient.
I would use split-apply-combine to split by year, find all the combinations, and then combine back together:
do.call(rbind, lapply(split(data, data$year), function(x) {
p <- combn(nrow(x), 2)
data.frame(order=paste(x$order[p[1,]], x$order[p[2,]], sep="_"),
obs1=x$obs[p[1,]],
obs2=x$obs[p[2,]],
year=x$year[1],
var1=x$var1[p[1,]] + x$var1[p[2,]],
var2=x$var2[p[1,]] + x$var2[p[2,]],
var3=x$var3[p[1,]] + x$var3[p[2,]])
}))
# order obs1 obs2 year var1 var2 var3
# 1.1 3_33 1 3 1 42 802 NA
# 1.2 3_78 1 6 1 37 650 NA
# 1.3 3_93 1 7 1 NA NA NA
# 1.4 33_78 3 6 1 15 276 NA
# 1.5 33_93 3 7 1 NA NA NA
# 1.6 78_93 6 7 1 NA NA NA
# 2.1 4_34 1 3 2 43 926 3730
# 2.2 4_79 1 6 2 38 764 2681
# 2.3 4_94 1 7 2 37 775 2632
# 2.4 34_79 3 6 2 15 312 1641
# 2.5 34_94 3 7 2 14 323 1592
# 2.6 79_94 6 7 2 9 161 543
# 3.1 5_35 1 3 3 NA 920 3762
# 3.2 5_80 1 6 3 NA 754 2869
# 3.3 5_95 1 7 3 NA 732 2576
# 3.4 35_80 3 6 3 15 318 1893
# 3.5 35_95 3 7 3 13 296 1600
# 3.6 80_95 6 7 3 8 130 707
This enables you to be very flexible in how you combine data pairs of observations within a year --- x[p[1,],] represents the year-specific data for the first element in each pair and x[p[2,],] represents the year-specific data for the second element in each pair. You can return a year-specific data frame with any combination of data for the pairs, and the year-specific data frames are combined into a single final data frame with do.call and rbind.

How do I remove an extra line in a chart with ggplot

I'm a new R user and I am trying to chart an interaction between 2 continuous variables and a categorical variable.
Using interaction.plot:
interaction.plot(nonconform, trans, employdisc, type="b", col=(1:3) ,
leg.bty="o", leg.bg="beige", lwd=2, pch=c(18,24,22),
xlab="Nonconformity",
ylab="Discrimination",
main="Interaction Plot")
I get this result:
interaction plot
When I attempt to do the same thing with ggplot
ggplot(data=NTDS.zip, aes(x=nonconform, y=employdisc, colour = factor(trans), group=trans, )) +
stat_summary(fun.y=mean, geom="point") +
stat_summary(fun.y=mean, geom="line")
I get this result:
ggplot chart
There is an extra line (in grey that I can't get rid off). Its likely representing missing data, but haven't found a way to remove that line from the chart. Any discussion I found talked about suppressing warning due to missing data, but nothing regarding extra lines in a chart.
Any thoughts?
Update
After reading the R Graphics Cookbook I tried another method.
THe book's method involved summarizing the data first.
tg <- ddply(ntds.new, c("trans", "nonconform"), summarize, empdisc=mean(employdisc))
and then plotting the chart.
I tried 2 types (colour and linetype)
ggplot(tg, aes(x=nonconform, y=empdisc, colour=trans))+geom_line()
ggplot(tg, aes(x=nonconform, y=empdisc, linetype=trans))+geom_line()
The plot with the colour statement has the extra line, while the plot with linetype does not.
the data for this was:
trans nonconform empdisc
1 1 0 1.104046
2 1 1 1.472050
3 1 2 1.930070
4 1 3 2.247706
5 1 4 3.407407
6 1 NA 7.250000
7 2 0 3.427230
8 2 1 3.929707
9 2 2 4.062275
10 2 3 4.373853
11 2 4 4.470149
12 2 NA 5.294118
13 3 0 1.309524
14 3 1 1.968310
15 3 2 2.366589
16 3 3 3.815000
17 3 4 3.560606
18 3 NA 6.000000
19 4 0 2.661290
20 4 1 3.208861
21 4 2 3.033195
22 4 3 3.322176
23 4 4 3.755906
24 4 NA 6.625000
25 NA 0 4.000000
26 NA 1 4.166667
27 NA 2 2.500000
28 NA 3 6.666667
29 NA 4 5.400000
30 NA NA 2.000000
I went back and deleted the (10) lines with missing cases for either trans or nonconform columns.
trans nonconform empdisc
1 1 0 1.104046
2 1 1 1.472050
3 1 2 1.930070
4 1 3 2.247706
5 1 4 3.407407
6 2 0 3.427230
7 2 1 3.929707
8 2 2 4.062275
9 2 3 4.373853
10 2 4 4.470149
11 3 0 1.309524
12 3 1 1.968310
13 3 2 2.366589
14 3 3 3.815000
15 3 4 3.560606
16 4 0 2.661290
17 4 1 3.208861
18 4 2 3.033195
19 4 3 3.322176
20 4 4 3.755906
This solved my initial problem but this solution seems more complicated than it should be, and I'm curious as to why the plot with "colour" was affected and the one with "linetype" wasn't.
If we look in your data in table tg then there are NA values for the variable trans.
When you use trans (as factor) for the colors of the lines those NA values are also plotted because for color scales default action for NA levels is to plot them in grey50 color (na.value="grey50"). But for the linetype scales default action for NA levels is to plot blank line (na.value="blank") so you don't see the line.
To solve the problem there are couple of solutions. First, you can add the scale_color_discrete() and set the na.value= to NA.
ggplot(tg, aes(x=nonconform, y=empdisc, colour=as.factor(trans)))+
geom_line()+
scale_color_discrete(na.value=NA)
Another solution is to subset your data to remove NA values from your data and then plot your data. This can be done also inside the ggplot() call.
ggplot(tg[complete.cases(tg),], aes(x=nonconform, y=empdisc, colour=as.factor(trans)))+
geom_line()

R: Positioning labels and axes with rgl.plot3d

I'm trying to create a 3d scatter plot using rgl.plot3d. However, the default positioning of the labels and axes is not satisfactory. E.g., the y-axis label is positioned on the far side, while I want it to be positioned on the near side. The x-axis ticks are positioned at the far top. I went them to be positioned at the near bottom. I looked at ?par3dbut couldn't find anything that would help me. Is it possible to do this in rgl? Code and data are given below. Thank you.
Code
d <- read.table(file='myfile.dat', header=F)
plot3d(
d,
xlim=c(0,20),
ylim=c(0,20),
zlim=c(0,10000),
box=F,
type='p',
size=5,
col=d[,1]
)
mtext3d(text='Test', edge='y+-', line=2)
axes3d(
edges=c('x--', 'y+-', 'z--'),
labels=T
)
lines3d(
d,
lwd=2,
col=d[,1]
)
grid3d(side=c('x', 'y+', 'z'))
Data
11 2 2
NA NA NA
10 2 2
NA NA NA
13 2 1
NA NA NA
15 2 1
NA NA NA
5 2 11
5 3 10
5 4 16
5 5 34
5 6 102
5 7 294
5 8 682
5 9 1439
5 10 2646
5 11 3615
5 12 2844
5 13 1394
NA NA NA
4 2 10
4 3 4
4 4 4
4 5 10
4 6 38
4 7 132
4 8 396
4 9 976
4 10 2121
4 11 4085
4 12 6261
4 13 6459
4 14 4238
4 15 1394
NA NA NA
7 2 3
NA NA NA
6 2 2
NA NA NA
9 2 8
9 3 6
9 4 4
9 5 5
NA NA NA
8 2 4
8 3 10
8 4 22
8 5 52
8 6 126
8 7 264
8 8 478
8 9 729
8 10 943
8 11 754
8 12 382
NA NA NA
You need to look at ?axis3d where the use of the 'edges' parameter is described. If you want the x-axis tick labels at the front-bottom and the y-axis on the near+bottom side, you would first build the plot using ..., axes=FALSE, and with the focus unchanged issue this command at the console:
axes3d( edges=c("x--", "y--", "z") )
I have not yet figured out whether it is possible to remove an existing axis in an rgl plot.

Resources