How to draw these 2 functions on the same graph? - r

I want to draw 2 functions on the same graph, because the goal is to show whether one of them is uniformly higher than the other one.
Function 1 is:
f(x)
=0 if x <= -0.05,
=x+0.05 if -0.05<x<=0.95
=1 if x>0.95
Function 2 is: g(x)
=0 if x <= -0.16
=(2x+2-1.68)^2/2 if -0.16 <x<=0.34
=1-(1.68-2x)^2/2 if 0.34<x<=0.84
=1 if x>0.84

Just spell out the comparisons in detail, so you are sure to get them right:
f <- function(x) {
if( x <= -.05 )
return(0)
if( x <= .95 )
return( x + 0.05 )
## the rest:
return( 1 )
}
g <- function(x) {
if( x <= -.16 )
return(0)
if( x <= .34 )
return( (2*x+2-1.68)^2/2 )
if( x <= 0.84 )
return( 1-(1.68-2*x)^2/2 )
## the rest:
return(1)
}
curve( Vectorize(f)(x), from=-1, to=2 )
curve( Vectorize(g)(x), from=-1, to=2, add=TRUE, col="red" )
legend( "right", col=c("black","red"), legend=c("f(x)", "g(x)"), lwd=2 )
(The Vectorize bit of the code may look weird. It's there because the functions as written can only process one number at a time, while the curve function needs the function to process multiple values at once. Vectorize fixes that.)
Improvements to function 1:
The first function can be simplified greatly, since it's simply clamping your x value to be in a fixed range
pmax(pmin(x+0.05,1),0) # would be one way
That expresion makes sure it never goes above 1, by taking the minimum of 1 and your variable value, and never goes below 0, by taking the maximum of 0 and your variable value.
If you have the raster package you can also use the clamp function from there:
library(raster)
clamp( x+0.05, 0, 1 )
It just cuts any values out of the range to be the limits of the range.
You can add these to the graph yourself, one by one, to see that they end up right on top of the black line:
curve( pmax(pmin(x+0.05,1),0) , add=TRUE, col="green" )
curve( clamp(x+.05,0,1) , add=TRUE, col="blue" )

You can have plot with add=TRUE like below
plot(f, -1.5, 1.5)
plot(g, add = TRUE)
where
f <- function(x) {
ifelse(x <= -0.05,
0, ifelse(x <= 0.95,
x + 0.05, 1
)
)
}
g <- function(x) {
ifelse(x <= -0.16,
0,
ifelse(x <= 0.34,
(2 * x + 2 - 1.68)^2 / 2,
ifelse(x <= 0.84,
1 - (1.68 - 2 * x)^2 / 2,
1
)
)
)
}
such that

Related

Plotting same equation but different graph. l

I tried to plot Taylor expansion with the exponential function, f(x) = exp( x ) at a = 0 and by n = 1.
At first, I wrote the equation #### n = 1 and plotted it. But It failed to fit the line to graph of Exp(x). And I tried to relocate sign '+' to upper line, as see the equation # n = 1, and it worked.
These Equations are exactly same, except location of the operator sign ( + ).
What is difference between the equations ( #### n = 1, # n = 1 ) for plotting?
Is it important the location of operator signs in plotting?
f1 <- function(x) exp( x )
x <- seq( -1, 1, by = 0.025 )
a <- 0
#### n = 1
f1.10 <- function( x ){
exp( a ) / factorial( 0 ) * ( x - a )^0
+ exp( a ) / factorial( 1 ) * ( x - a )^1
}
# n = 1
f1.1 <- function( x ){
exp( a ) / factorial( 0 ) * ( x - a )^0 +
exp( a ) / factorial( 1 ) * ( x - a )^1
}
plot( x, exp(x), ylab = "exp(x)", type = "l", lwd = 3 )
curve( f1.1, -0.5, 0.5, add = T, type = "l", lwd = 2, col = "yellow" )
curve( f1.10, -0.5, 0.5, add = T, type = "l", lwd = 2, col = "blue" )
]1
In R, + of line head means new line. If you put + at end of line, R recognize the next line in script as continuation of current line.
So,
f1.10 <- function( x ){
exp( a ) / factorial( 0 ) * ( x - a )^0
+ exp( a ) / factorial( 1 ) * ( x - a )^1
}
is the same as
f1.11 <- function( x ){
exp( a ) / factorial( 1 ) * ( x - a )^1
}
The two functions produce different results, and they're not the same.
f1.10(seq(10))
#[1] 1 2 3 4 5 6 7 8 9 10
f1.1(seq(10))
#[1] 2 3 4 5 6 7 8 9 10 11
When n is 1 f1.10 produce 1 and f1.1 produce 2 or we can even produce a random number between -0.5 and 0.5 as follows to see if they produce the same result.
set.seed(777)
x <- runif(100, min = -0.5, max = 0.5)
min(f1.10(x))
#[1] -0.4940781
max(f1.10(x))
#[1] 0.4950499
min(f1.1(x))
#[1] 0.5059219
max(f1.1(x))
#[1] 1.49505
You can see that f1.10 is constrained between -0.5 and 0.5 where as f1.1 is not.

Sequence labeller

I am analysing a time series signal. I set a threshold to separate the noise from the baseline noise. In order to identify the properties of each signal sequence (duration, amplitude, maximum signal...), I built a function to aggregate all the signal points that are continuous as different "peaks". Despite this function does what I want, I was wondering if anyone can help me to make it more efficient -e. g. vectorization, because I aim to run the function on a data.table of more than 1M rows. Here is a sample data with the function:
# Generate dummy data
x <- sin(seq(from = 0, to = 20, length.out = 200)) + rnorm(200, 0,0.1)
x <- zoo(x)
plot(x)
# Label each point as signal (== )1) or noise (0)
y <- ifelse(x > 0.5, 1, 0)
# Function to label each peak
peak_labeler <- function(x) {
tmp <- NULL
for (i in seq_along(x)) {
if (x[i] == 0) { tmp[i] <- 0 } # If baseline, mark as 0
if (x[i] == 1) {
# If x[n] belongs to a peak
if (i == 1) {tmp[i] <- 1} # Label as 1 at t0
else{
if (!exists("Peak")) {Peak <- 0}
if (x[i - 1] == 0) {
# if previous point is no peak, add as peak
Peak <- Peak + 1
tmp[i] <- Peak
}
if (x[i - 1] == 1) {
tmp[i] <- Peak
}
}
}
}
return(tmp)
rm(tmp, Peak, i) # Garbage collection
}
# Label peaks
dummy <- data.frame(t = 1:200, x,y,tmp = peak_labeler(y))
# Show data
ggplot(dummy, aes(x = t, y = x)) +
geom_point(aes(col = as.factor(tmp), group = 1))
Here's an approach using dplyr.
The test in the cross_threshold line works by evaluating whether y is on a different side of 0.5 than the prior y. If so, the sign of the two terms y - threshold and lag(y) - threshold will be different, leading to a TRUE, which is multiplied by 1 to become 1. If they're on the same side of 0.5, you'll get a FALSE and a 0. The default = 0 part deals with the first line, where lag(y) is undefined. Then we add up how many cumulative crosses there have been to define the tmp group.
library(dplyr)
threshold = 0.5
dummy <- data.frame(t = 1:200, x, y) %>%
mutate(cross_threshold = 1 * (sign(y - threshold) != sign(lag(y, default = 0) - threshold)),
# Line above now optional, just if we want to label all crossings
up = 1 * ((y > threshold) & (lag(y) < threshold)),
tmp = if_else(y > threshold, cumsum(up), 0))
ggplot(dummy, aes(x = t, y = x)) +
geom_point(aes(col = as.factor(tmp), group = 1)) +
geom_point(data = filter(dummy, cross_threshold == 1), shape = 21, size = 5)

intersection of two step functions

I trying to determine the point (x,y) where two functions intersect. The functions are the step interpolation between sets of points. One function is weakly increasing (v1) and the other weakly decreasing (v2). I'm coding in R, but a general algorithm is also ok.
If it helps, this is to determine market equilibrium with sets of supply and demand points.
The length of the two vectors is different and their x's and y's will not be the same.
Some example data:
set.seed(4)
v1 = data.frame( y = cumsum( runif(10) ) ,
x = cumsum( runif(10) ) )
v2 = data.frame( y = 5-cumsum( runif(8) ) ,
x = cumsum( runif(8) ) )
plot(y=0,x=0,type="n",xlim=c(0,5),ylim=c(0,5),xlab="x",ylab="y")
lines( y=v1$y , x=v1$x , type="S" , col="blue" )
lines( y=v1$y , x=v1$x , type="p" , col="blue" )
lines( y=v2$y , x=v2$x , type="s" , col="red" )
lines( y=v2$y , x=v2$x , type="p" , col="red" )
In this example, the intersection is at (x=2.7275363 , y=2.510405), where the x is from v2 and y is from v1.
Thanks
As I was facing the same issue, but was dependent on speed. I used the wonderful Rcpp to speed the code up.
If anybody is interested, this is what I did:
library(dplyr) # for data manipulation only, not used for the algorithm!
library(ggplot2) # for data graphing only, not used for the algorithm!
# Load (i.e., Source the Cpp function)
Rcpp::sourceCpp("find_optimum.cpp")
# small helper function that plots the supply and demand as a step-function
plot_supply_demand <- function(supply, demand) {
supply_df <- supply %>%
bind_rows(data_frame(p = -Inf, q = 0)) %>%
arrange(p) %>%
mutate(agg_q = cumsum(q), side = "supply") %>%
bind_rows(data_frame(p = Inf, q = 0, agg_q = sum(supply$q), side = "supply"))
demand_df <- demand %>%
bind_rows(data_frame(p = Inf, q = 0)) %>%
arrange(desc(p)) %>%
mutate(agg_q = cumsum(q), side = "demand") %>%
bind_rows(data_frame(p = -Inf, q = 0, agg_q = sum(demand$q), side = "demand"))
ggplot(mapping = aes(x = p, y = agg_q, color = side)) +
geom_step(data = demand_df, direction = "vh") +
geom_step(data = supply_df)
}
# create two data_frames containing the disaggregated data (i.e., orders)
# by graphing the data, or by calculating it by hand we see the optimum at (10, 2)
supply_small = data_frame(p = c(8, 10),
q = c(1, 2))
demand_small = data_frame(p = c(12, 10, 8),
q = c(1, 1, 1))
plot_supply_demand(supply_small, demand_small) +
geom_point(aes(x = 10, y = 2), color = "red", size = 5)
find_optimum(supply_small$p, supply_small$q, demand_small$p, demand_small$q)
#> $price
#> [1] 10
#>
#> $quantity
#> [1] 2
Larger example
set.seed(12345678)
demand <- data_frame(p = runif(100, 80, 200), q = rnorm(100, 10, 2))
supply <- data_frame(p = runif(100, 0, 120), q = rnorm(100, 10, 2))
opt <- find_optimum(supply$p, supply$q, demand$p, demand$q)
opt
#> $price
#> [1] 102.5982
#>
#> $quantity
#> [1] 841.8772
plot_supply_demand(supply, demand) +
geom_point(aes(x = opt$price, y = opt$quantity), color = "red", size = 2)
To zoom in a bit on the optimum, we can use the following
plot_supply_demand(supply, demand) +
geom_point(aes(x = opt$price, y = opt$quantity), color = "red", size = 2) +
xlim(opt$price + c(-10, 10)) + ylim(opt$quantity + c(-50, 50))
#> Warning: Removed 92 rows containing missing values (geom_path).
#> Warning: Removed 93 rows containing missing values (geom_path).
Created on 2018-10-20 by the reprex package (v0.2.0).
Rcpp Function
And last but not least, the C++ function in find_optimum.cpp that does the heavy lifting:
#include <Rcpp.h>
#include <map>
// [[Rcpp::export]]
Rcpp::List find_optimum(Rcpp::NumericVector price_supply,
Rcpp::NumericVector quant_supply,
Rcpp::NumericVector price_demand,
Rcpp::NumericVector quant_demand) {
std::map<double, double> supply;
std::map<double, double> demand;
// fill the maps
for (int i = 0; i < price_supply.size(); ++i) {
supply[price_supply[i]] += quant_supply[i];
}
for (int i = 0; i < price_demand.size(); ++i) {
demand[price_demand[i]] += quant_demand[i];
}
if (supply.empty() || demand.empty())
return Rcpp::List::create(Rcpp::Named("price") = 0, Rcpp::Named("quantity") = 0);
auto sIt = supply.begin(), nextS = std::next(sIt, 1);
const auto endS = supply.end();
auto dIt = demand.rbegin(), nextD = std::next(dIt, 1);
const auto endD = demand.rend();
// quantity and prices at either side
double pS = sIt->first, pD = dIt->first;
double qS = 0, qD = 0;
// next prices
double nextPS = nextS->first, nextPD = nextD->first;
if (pD < pS)
return Rcpp::List::create(Rcpp::Named("price") = 0, Rcpp::Named("quantity") = 0);
// add the best price from each side!
qS += sIt->second;
qD += dIt->second;
while (pS < pD) {
if (nextS == endS && nextD == endD) {
pD = qD < qS ? pS : pD;
break;
}
while (qS <= qD && sIt != endS && nextS->first <= pD) {
++sIt;
++nextS;
pS = sIt->first;
qS += sIt->second;
}
if (sIt == endS) break;
if (nextD->first < pS) {
pD = qD < qS ? pS : pD;
break;
}
while (qD < qS && dIt != endD && nextD->first >= pS) {
++dIt;
++nextD;
pD = dIt->first;
qD += dIt->second;
}
if (dIt == endD) break;
}
double price = pD;
double vol = qS < qD ? qS : qD;
return Rcpp::List::create(Rcpp::Named("price") = price,
Rcpp::Named("quantity") = vol);
}
You're drawing your step lines differently in each case: v1 you change the vertical first, and then the horizontal (up and across), whereas for v2 you reverse the order (across then down). Assuming this is correct, then your intersection point will be at or immediately after a point in v1 where the next point along the axis is a v1 with a lower y coordinate. We can find that by doing:
v1$v <- 1
v2$v <- 2
v3 <- rbind(v1,v2)
v3 <- v3[order(v3$x),]
v3$diff <- c( diff(v3$y),0)
ind <- which(v3$diff < 0 & v3$v ==1)[1]
There are now two distinct cases - the intersection could be on the horizontal or vertical arm following this point from v1. It will be the former if the immediately preceeding v2 is higher than the v1 after our found one; otherwise it will be in the horizontal arm. This is clear if you draw it out - I'll try and attach an image if you don't see this.
previousV2 <- tail(which(v3$v[1:ind]==2),1)
nextV1 <- which(v3$v[-(1:ind)]==1)[1] + ind
if (v3$y[previousV2] > v3$y[nextV1]) {
x <- v3$x[ind+1]
y <- v3$y[nextV1]
} else {
x <- v3$x[ind]
y <- v3$y[previousV2]
}
Worryingly, this doesn't agree with your (x=2.7275363 , y=2.510405) answer, but when I plot it, mine appears on the intersection. So either: I haven't understood what you want; you've miscalculated; or there's a different scheme regarding the order of horizontal and vertical components. The above code should be adaptable to different schemes.
I seem to have something that works but it's a lot more complicated than i was expecting.
First, let me define a helper function
between <- function(x, a, b) {
if(missing(b)) {
if(length(a)==2) {
a<-t(a)
}
} else {
a <- unname(cbind(a,b))
}
a<-t(apply(a,1,sort))
a[,1] <= x & x <= a[,2]
}
this just helps to check if a number is between two others. Now I will embed the two data.frames to make sets of consecutive point pairs, then i check each possible combination for segments that overlap in just the right way. (It's important that v1 here is the "S" and v2 is the s.)
sa<-embed(as.matrix(v1[,c("x","y")]),2)
sz<-embed(as.matrix(v2[,c("x","y")]),2)
xx<-outer(1:nrow(sa), 1:nrow(sz), function(a,z)
(between(sa[a,2], sz[z,c(2,4)]) & between(sz[z,1], sa[a,c(1,3)])) *1
+ (between(sz[z,4], sa[a,c(2,4)]) & between(sa[a,3], sz[z,c(1,3)]))*2
)
Now xx contains the matching set of points, I just need to extract the correct coordinates depending on which type of intersection occurred.
i <- which(xx!=0, arr.ind=T)
int.pt <- if(nrow(i)>0 && ncol(i)==2) {
if(xx[i]==1) {
c(sz[i[2],1], sa[i[1],2])
} else if (xx[i]==2) {
c(sa[i[1],3], sz[i[2],4])
}
} else {
c(NA,NA)
}
#optionally plot intersection
#if (all(!is.na(int.pt))) {
# points(int.pt[1],int.pt[2], pch=20, col="black")
# abline(v=int.pt[1], h=int.pt[2], lty=2)
#}
Perhaps there is a better way, but at least you have another method that seems to work to compare answers with.
I had another think about the problem. A key issue is that I need to find the intersection within an optimisation routine, so it has to be fast. So, I came up with the following (included here in case others have to same problem in the future). It is a modified Bentley-Ottmann algorithm.
# create some data
supply = data.frame( p = cumsum( runif(1000) ) ,
q = cumsum( runif(1000) ) )
demand = data.frame( p = tail(supply,1)$p - cumsum( runif(1000) ) ,
q = cumsum( runif(1000) ) )
# create tables that identify coordinates of horizontal and vertical lines
demand.h = cbind( p = head(demand,-1)$p ,
q.lower = head(demand,-1)$q ,
q.upper = tail(demand,-1)$q )
supply.v = cbind( q = head(supply,-1)$q ,
p.lower = head(supply,-1)$p ,
p.upper = tail(supply,-1)$p )
demand.v = cbind( q = tail(demand,-1)$q ,
p.lower = tail(demand,-1)$p ,
p.upper = head(demand,-1)$p )
supply.h = cbind( p = tail(supply,-1)$p ,
q.lower = head(supply,-1)$q ,
q.upper = tail(supply,-1)$q )
# define a function
find.intersection = function( f.A , f.B ){
f.result = any( f.B[,2]<=f.A[1] & f.B[,3]>=f.A[1] &
f.A[2] <=f.B[,1] & f.A[3] >=f.B[,1] )
return( f.result )
}
# find the intersection
intersection.h = c( demand.h[ apply( demand.h ,
MARGIN=1 ,
FUN=find.intersection ,
supply.v ) , 1 ] ,
supply.v[ apply( supply.v ,
MARGIN=1 ,
FUN=find.intersection ,
demand.h ) , 1 ] )
intersection.v = c( supply.h[ apply( supply.h ,
MARGIN=1 ,
FUN=find.intersection ,
demand.v ) , 1 ] ,
demand.v[ apply( demand.v ,
MARGIN=1 ,
FUN=find.intersection ,
supply.h ) , 1 ] )
intersection = c( intersection.h , intersection.v )
# (optional) if you want to print the graph and intersection
plot(y=0,x=0,type="n",
xlim=c(intersection[2]-1,intersection[2]+1),
ylim=c(intersection[1]-1,intersection[1]+1),
xlab="q",ylab="p")
lines( y=supply$p , x=supply$q , type="S" , col="black" )
lines( y=supply$p , x=supply$q , type="p" , col="black" )
lines( y=demand$p , x=demand$q , type="s" , col="black" )
lines( y=demand$p , x=demand$q , type="p" , col="black" )
points(intersection[2],intersection[1], pch=20, col="red")
abline( v=intersection[2], h=intersection[1], lty=2 , col="red")

R - Plot a region described by planes with rgl

I want to plot a polyhedron, which is described by the following inequalities:
3*x+5*y+9*z<=500
4*x+5*z<=350
2*y+3*z<=150
x,y,z>=0
It is a linear program. The objective function is:
4*x+3*y+6*z
The polyhedron is the feasible region for this program.
I am able to plot the inequalities as planes, which should describe the polyhedron
(Note that this is my first try with rgl, so the code is kinda messy. if you want to improve it, please feel free to do so):
# setup
x <- seq(0,9,length=20)*seq(0,9,length=20)
y <- x
t <- x
f1 <- function(x,y){y=70-0.8*x}
z1 <- outer(x,y,f1)
f2 <- function(x,y){500/9-x/3-(5*y)/9}
z2 <- outer(x,y,f2)
f3 <- function(x,y){t=50-(2*y)/3}
z3 <- outer(x,y,f3)
# plot planes with rgl
uM = matrix(c(0.72428817, 0.03278469, -0.68134511, 0,
-0.6786808, 0.0555667, -0.7267077, 0,
0.01567543, 0.99948466, 0.05903265, 0,
0, 0, 0, 1),
4, 4)
library(rgl)
open3d(userMatrix = uM, windowRect = c(0, 0, 400, 400))
rgl.pop("lights")
light3d(diffuse='white',theta=0,phi=20)
light3d(diffuse="gray10", specular="gray25")
rgl.light(theta = 0, phi = 0, viewpoint.rel = TRUE, ambient = "#FFFFFF",
diffuse = "#FFFFFF", specular = "#FFFFFF", x=30, y=30, z=40)
rgl.light(theta = 0, phi = 0, viewpoint.rel = TRUE, ambient = "#FFFFFF",
diffuse = "#FFFFFF", specular = "#FFFFFF", x=0, y=0, z=0)
bg3d("white")
material3d(col="white")
persp3d(x,y,z3,
xlim=c(0,100), ylim=c(0,100), zlim=c(0,100),
xlab='x', ylab='y', zlab='z',
col='lightblue',
ltheta=100, shade=0, ticktype = "simple")
surface3d(x, y, z2, col='orange', alpha=1)
surface3d(t, y, z1, col='pink', alpha=1, smooth=TRUE)
Now I want to plot the region that is described by the planes with
x,y,z>=0.
But I don't know how to do it. I tried to do it like this:
x <- seq(0,9,length=20)*seq(0,9,length=20)
y <- x
z <- x
f4 <- function(x,y,t){
cond1 <- 3*x+5*y+9*z<=500
cond2 <- 4*x+5*z<=350
cond3 <- 2*y+3*z<=150
ifelse(cond1, 3*x+5*y+9*z,
ifelse(cond2, 4*x+5*z,
ifelse(cond3, 2*y+3*z,0)))
}
f4(x,y,z)
z4 <- outer(x,y,z,f4) # ERROR
But this is the point where I'm stuck. outer() is defined only for 2 variables, but I have three. How can I move on from here?
You can compute the vertices of the polyhedron by intersecting the planes 3 at a time
(some of the intersections are outside the polyhedron, because of other inequalities:
you have to check those as well).
Once you have the vertices, you can try to connect them.
To identify which are on the boundary, you can take the middle of the segment,
and check if any inequality is satisfied as an equality.
# Write the inequalities as: planes %*% c(x,y,z,1) <= 0
planes <- matrix( c(
3, 5, 9, -500,
4, 0, 5, -350,
0, 2, 3, -150,
-1, 0, 0, 0,
0, -1, 0, 0,
0, 0, -1, 0
), nc = 4, byrow = TRUE )
# Compute the vertices
n <- nrow(planes)
vertices <- NULL
for( i in 1:n )
for( j in 1:n)
for( k in 1:n )
if( i < j && j < k ) try( {
# Intersection of the planes i, j, k
vertex <- solve(planes[c(i,j,k),-4], -planes[c(i,j,k),4] )
# Check that it is indeed in the polyhedron
if( all( planes %*% c(vertex,1) <= 1e-6 ) ) {
print(vertex)
vertices <- rbind( vertices, vertex )
}
} )
# For each pair of points, check if the segment is on the boundary, and draw it
library(rgl)
open3d()
m <- nrow(vertices)
for( i in 1:m )
for( j in 1:m )
if( i < j ) {
# Middle of the segment
p <- .5 * vertices[i,] + .5 * vertices[j,]
# Check if it is at the intersection of two planes
if( sum( abs( planes %*% c(p,1) ) < 1e-6 ) >= 2 )
segments3d(vertices[c(i,j),])
}

Predict values from sinusoidal noise

Background
Using R to predict the next values in a series.
Problem
The following code generates and plots a model for a curve with some uniform noise:
slope = 0.55
offset = -0.5
amplitude = 0.22
frequency = 3
noise = 0.75
x <- seq( 0, 200 )
y <- offset + (slope * x / 100) + (amplitude * sin( frequency * x / 100 ))
yn <- y + (noise * runif( length( x ) ))
gam.object <- gam( yn ~ s( x ) + 0 )
plot( gam.object, col = rgb( 1.0, 0.392, 0.0 ) )
points( x, yn, col = rgb( 0.121, 0.247, 0.506 ) )
The model reveals the trend, as expected. The trouble is predicting subsequent values:
p <- predict( gam.object, data.frame( x=201:210 ) )
The predictions do not look correct when plotted:
df <- data.frame( fit=c( fitted( gam.object ), p ) )
plot( seq( 1:211 ), df[,], col="blue" )
points( yn, col="orange" )
The predicted values (from 201 onwards) appear to be too low.
Questions
Are the predicted values, as shown, actually the most accurate predictions?
If not, how can the accuracy be improved?
What is a better way to concatenate the two data sets (fitted.values( gam.object ) and p)?
The simulated data is weird, because all the errors you add to the "true" y are greater than 0. (runif creates numbers on [0,1], not [-1,1].)
The problem disappears when an intercept term in the model is allowed.
For example:
gam.object2 <- gam( yn ~ s( x ))
p2 <- predict( gam.object2, data.frame( x=201:210 ))
points( 1:211, c( fitted( gam.object2 ), p2), col="green")
The reason for the systematic underestimation in the model without intercept could be that gam uses a sum-to-zero constraint on the estimated smooth functions. I think point 2 answers your first and second questions.
Your third question needs clarification because a gam-object is not a data.frame. The two data types do not mix.
A more complete example:
slope = 0.55
amplitude = 0.22
frequency = 3
noise = 0.75
x <- 1:200
y <- (slope * x / 100) + (amplitude * sin( frequency * x / 100 ))
ynoise <- y + (noise * runif( length( x ) ))
gam.object <- gam( ynoise ~ s( x ) )
p <- predict( gam.object, data.frame( x = 1:210 ) )
plot( p, col = rgb( 0, 0.75, 0.2 ) )
points( x, ynoise, col = rgb( 0.121, 0.247, 0.506 ) )
points( fitted( gam.object ), col = rgb( 1.0, 0.392, 0.0 ) )

Resources