Simplifying a cubefile parser - pyparsing

I am trying to parse cubefiles, i.e. something like this:
Cube file format
Generated by MRChem
1 -1.500000e+01 -1.500000e+01 -1.500000e+01 1
10 3.333333e+00 0.000000e+00 0.000000e+00
10 0.000000e+00 3.333333e+00 0.000000e+00
10 0.000000e+00 0.000000e+00 3.333333e+00
2 2.000000e+00 0.000000e+00 0.000000e+00 0.000000e+00
4.9345e-14 3.5148e-13 1.5150e-12 3.8095e-12 6.1568e-12 6.1568e-12
3.8095e-12 1.5150e-12 3.5148e-13 4.9344e-14 3.5148e-13 2.3779e-12
1.0450e-11 3.0272e-11 5.4810e-11 5.4810e-11 3.0272e-11 1.0450e-11
My current parser is as follows:
import pyparsing as pp
# define simplest bits
int_t = pp.pyparsing_common.signed_integer
float_t = pp.pyparsing_common.sci_real
str_t = pp.Word(pp.alphanums)
# comments: the first two lines of the file
comment_t = pp.OneOrMore(str_t, stopOn=int_t)("comment")
# preamble: cube axes and molecular geometry
preamble_t = ((int_t + pp.OneOrMore(float_t) + int_t) \
+ (int_t + float_t + float_t + float_t) \
+ (int_t + float_t + float_t + float_t) \
+ (int_t + float_t + float_t + float_t) \
+ (int_t + float_t + float_t + float_t + float_t))("preamble")
# voxel data: volumetric data on cubic grid
voxel_t = pp.delimitedList(float_t, delim=pp.Empty())("voxels")
# the whole parser
cube_t = comment_t + preamble_t + voxel_t
The code above does work, but can it improved? Especially the definition of preamble_t seems to me like it could be done more elegantly. I have not been able to, though: my attempts thus far have only resulted in non-working parsers.
UPDATE
Following the answer and the further suggestion on rolling my own countedArray, this is what I have now:
import pyparsing as pp
int_t = pp.pyparsing_common.signed_integer
nonzero_uint_t = pp.Word("123456789", pp.nums).setParseAction(pp.pyparsing_common.convertToInteger)
nonzero_int_t = pp.Word("+-123456789", pp.nums).setParseAction(lambda t: abs(int(t[0])))
float_t = pp.pyparsing_common.sci_real
str_t = pp.Word(pp.printables)
coords = pp.Group(float_t * 3)
axis_spec = pp.Group(int_t("nvoxels") + coords("vector"))
geom_field = pp.Group(int_t("atomic_number") + float_t("charge") + coords("position"))
def axis_spec_t(d):
return pp.Group(nonzero_uint_t("n_voxels") + coords("vector"))(f"{d.upper()}AXIS")
geom_field_t = pp.Group(nonzero_uint_t("ATOMIC_NUMBER") + float_t("CHARGE") + coords("POSITION"))
before = pp.Group(float_t * 3)("ORIGIN") + pp.Optional(nonzero_uint_t, default=1)("NVAL") + axis_spec_t("x") + axis_spec_t("y") + axis_spec_t("z")
after = pp.Optional(pp.countedArray(pp.pyparsing_common.integer))("DSET_IDS").setParseAction(lambda t: t[0] if len(t) !=0 else t)
def preamble_t(pre, post):
preamble_expr = pp.Forward()
def count(s, l, t):
n = t[0]
preamble_expr << (n and (pre + pp.Group(pp.And([geom_field_t]*n))("GEOM") + post) or pp.Group(empty))
return []
natoms_expr = nonzero_int_t("NATOMS")
natoms_expr.addParseAction(count, callDuringTry=True)
return natoms_expr + preamble_expr
w_nval = ["""3 -5.744767 -5.744767 -5.744767 1
80 0.143619 0.000000 0.000000
80 0.000000 0.143619 0.000000
80 0.000000 0.000000 0.143619
8 8.000000 0.000000 0.000000 0.000000
1 1.000000 0.000000 1.400000 1.100000
1 1.000000 0.000000 -1.400000 1.100000
2.21546E-05 2.47752E-05 2.76279E-05 3.07225E-05 3.40678E-05 3.76713E-05
4.15391E-05 4.56756E-05 5.00834E-05 5.47629E-05 5.97121E-05 6.49267E-05
7.03997E-05 7.61211E-05 8.20782E-05 8.82551E-05 9.46330E-05 1.01190E-04
1.07900E-04 1.14736E-04 1.21667E-04 1.28660E-04 1.35677E-04 1.42680E-04
1.49629E-04 1.56482E-04 1.63195E-04 1.69724E-04 1.76025E-04 1.82053E-04
1.87763E-04 1.93114E-04 1.98062E-04 2.02570E-04 2.06601E-04 2.10120E-04
""", """-3 -12.368781 -12.368781 -12.143417 92
80 0.313134 0.000000 0.000000
80 0.000000 0.313134 0.000000
80 0.000000 0.000000 0.313134
8 8.000000 0.000000 0.000000 0.225363
1 1.000000 0.000000 1.446453 -0.901454
1 1.000000 -0.000000 -1.446453 -0.901454
92 1 2 3 4 5 6 7 8 9
10 11 12 13 14 15 16 17 18 19
20 21 22 23 24 25 26 27 28 29
30 31 32 33 34 35 36 37 38 39
40 41 42 43 44 45 46 47 48 49
50 51 52 53 54 55 56 57 58 59
60 61 62 63 64 65 66 67 68 69
70 71 72 73 74 75 76 77 78 79
80 81 82 83 84 85 86 87 88 89
90 91 92
-1.00968E-10 -3.12856E-09 3.43398E-09 -8.36581E-09 -3.70577E-14 9.20035E-07
-3.78355E-06 -2.09418E-06 -9.41686E-13 -1.21366E-06 -4.87958E-06 3.50133E-06
-5.61999E-07 3.54869E-18 -1.30008E-12 -9.48885E-07 -1.44839E-06 -1.68959E-06
-3.21975E-06 -2.48399E-06 -5.12012E-07 -1.60147E-07 -9.88842E-13 -3.77732E-18
"""
]
for test in w_nval:
res = preamble_t(before, after).parseString(test).asDict()
print(f"{res=}")
wo_nval = ["""-3 -12.368781 -12.368781 -12.143417
80 0.313134 0.000000 0.000000
80 0.000000 0.313134 0.000000
80 0.000000 0.000000 0.313134
8 8.000000 0.000000 0.000000 0.225363
1 1.000000 0.000000 1.446453 -0.901454
1 1.000000 -0.000000 -1.446453 -0.901454
92 1 2 3 4 5 6 7 8 9
10 11 12 13 14 15 16 17 18 19
20 21 22 23 24 25 26 27 28 29
30 31 32 33 34 35 36 37 38 39
40 41 42 43 44 45 46 47 48 49
50 51 52 53 54 55 56 57 58 59
60 61 62 63 64 65 66 67 68 69
70 71 72 73 74 75 76 77 78 79
80 81 82 83 84 85 86 87 88 89
90 91 92
-1.00968E-10 -3.12856E-09 3.43398E-09 -8.36581E-09 -3.70577E-14 9.20035E-07
-3.78355E-06 -2.09418E-06 -9.41686E-13 -1.21366E-06 -4.87958E-06 3.50133E-06
-5.61999E-07 3.54869E-18 -1.30008E-12 -9.48885E-07 -1.44839E-06 -1.68959E-06
-3.21975E-06 -2.48399E-06 -5.12012E-07 -1.60147E-07 -9.88842E-13 -3.77732E-18
""",
"""3 -5.744767 -5.744767 -5.744767
80 0.143619 0.000000 0.000000
80 0.000000 0.143619 0.000000
80 0.000000 0.000000 0.143619
8 8.000000 0.000000 0.000000 0.000000
1 1.000000 0.000000 1.400000 1.100000
1 1.000000 0.000000 -1.400000 1.100000
2.21546E-05 2.47752E-05 2.76279E-05 3.07225E-05 3.40678E-05 3.76713E-05
4.15391E-05 4.56756E-05 5.00834E-05 5.47629E-05 5.97121E-05 6.49267E-05
7.03997E-05 7.61211E-05 8.20782E-05 8.82551E-05 9.46330E-05 1.01190E-04
1.07900E-04 1.14736E-04 1.21667E-04 1.28660E-04 1.35677E-04 1.42680E-04
1.49629E-04 1.56482E-04 1.63195E-04 1.69724E-04 1.76025E-04 1.82053E-04
1.87763E-04 1.93114E-04 1.98062E-04 2.02570E-04 2.06601E-04 2.10120E-04
"""]
for test in wo_nval:
res = preamble_t(before, after).parseString(test).asDict()
print(f"{res=}")
This works for the w_nval test cases (where the NVAL token is present) This token is, however, optional: the parsing of the wo_nval test cases fails, even though I am using the Optional token. Furthermore, the NATOMS token is not saved to the final dictionary. Is there a way to also save the counter in the countedArray implementation?
UPDATE 2
This is the final, working parser:
import pyparsing as pp
# non-zero unsigned integer
nonzero_uint_t = pp.Word("123456789", pp.nums).setParseAction(pp.pyparsing_common.convertToInteger)
# non-zero signed integer
nonzero_int_t = pp.Word("+-123456789", pp.nums).setParseAction(lambda t: abs(int(t[0])))
# floating point numbers, can be in scientific notation
float_t = pp.pyparsing_common.sci_real
# NVAL token
nval_t = pp.Optional(~pp.LineEnd() + nonzero_uint_t, default=1)("NVAL")
# Cartesian coordinates
# it could be alternatively defined as: coords = pp.Group(float_t("x") + float_t("y") + float_t("z"))
coords = pp.Group(float_t * 3)
# row with molecular geometry
geom_field_t = pp.Group(nonzero_uint_t("ATOMIC_NUMBER") + float_t("CHARGE") + coords("POSITION"))
# volumetric data
voxel_t = pp.delimitedList(float_t, delim=pp.Empty())("DATA")
# specification of cube axes
def axis_spec_t(d):
return pp.Group(nonzero_uint_t("NVOXELS") + coords("VECTOR"))(f"{d.upper()}AXIS")
before_t = pp.Group(float_t * 3)("ORIGIN") + nval_t + axis_spec_t("X") + axis_spec_t("Y") + axis_spec_t("Z")
# the parse action flattens the list
after_t = pp.Optional(pp.countedArray(pp.pyparsing_common.integer))("DSET_IDS").setParseAction(lambda t: t[0] if len(t) != 0 else t)
def preamble_t(pre, post):
expr = pp.Forward()
def count(s, l, t):
n = t[0]
expr << (geom_field_t * n)("GEOM")
return n
natoms_t = nonzero_int_t("NATOMS")
natoms_t.addParseAction(count, callDuringTry=True)
return natoms_t + pre + expr + post
cube_t = preamble_t(before_t, after_t) + voxel_t

Wow, you are very fortunate to have such a clear reference of the format for this data. Usually this kind of documentation is left to guesswork and experimentation.
Since you have a good definition of the layout, I would define some more groups, and results names:
# define some common field groups
coords = pp.Group(float_t * 3)
# or coords = pp.Group(float_t("x") + float_t("y") + float_t("z"))
axis_spec = pp.Group(int_t("nvoxels") + coords("vector"))
geom_field = pp.Group(int_t("atomic_number") + float_t("charge") + coords("position"))
Then use them to define preamble and give it some more structure:
preamble_t = pp.Group(
int_t("natoms")
+ coords("origin")
+ int_t("nval")
+ axis_spec("x_axis")
+ axis_spec("y_axis")
+ axis_spec("z_axis")
+ geom_field("geom")
)("preamble")
Now you can access the individual fields by name:
print(cube_t.parseString(sample).dump())
['Cube', 'file', 'format', 'Generated', 'by', 'MRChem', [1, [-15.0, -15.0, -15.0], 1, [10, [3.333333, 0.0, 0.0]], [10, [0.0, 3.333333, 0.0]], [10, [0.0, 0.0, 3.333333]], [2, 2.0, [0.0, 0.0, 0.0]]], 4.9345e-14, 3.5148e-13, 1.515e-12, 3.8095e-12, 6.1568e-12, 6.1568e-12, 3.8095e-12, 1.515e-12, 3.5148e-13, 4.9344e-14, 3.5148e-13, 2.3779e-12, 1.045e-11, 3.0272e-11, 5.481e-11, 5.481e-11, 3.0272e-11, 1.045e-11]
- comment: ['Cube', 'file', 'format', 'Generated', 'by', 'MRChem']
- preamble: [1, [-15.0, -15.0, -15.0], 1, [10, [3.333333, 0.0, 0.0]], [10, [0.0, 3.333333, 0.0]], [10, [0.0, 0.0, 3.333333]], [2, 2.0, [0.0, 0.0, 0.0]]]
- geom: [2, 2.0, [0.0, 0.0, 0.0]]
- atomic_number: 2
- charge: 2.0
- position: [0.0, 0.0, 0.0]
- natoms: 1
- nval: 1
- origin: [-15.0, -15.0, -15.0]
- x_axis: [10, [3.333333, 0.0, 0.0]]
- nvoxels: 10
- vector: [3.333333, 0.0, 0.0]
- y_axis: [10, [0.0, 3.333333, 0.0]]
- nvoxels: 10
- vector: [0.0, 3.333333, 0.0]
- z_axis: [10, [0.0, 0.0, 3.333333]]
- nvoxels: 10
- vector: [0.0, 0.0, 3.333333]
- voxels: [4.9345e-14, 3.5148e-13, 1.515e-12, 3.8095e-12, 6.1568e-12, 6.1568e-12, 3.8095e-12, 1.515e-12, 3.5148e-13, 4.9344e-14, 3.5148e-13, 2.3779e-12, 1.045e-11, 3.0272e-11, 5.481e-11, 5.481e-11, 3.0272e-11, 1.045e-11]
Extra credit: I see that the GEOM field should actually be repeated NATOMS times. Look at the code for countedArray to see how to make a self-modifying parser so that you can parse NATOMS x GEOM fields.

Related

How do you get the posterior estimates from a stanfit object just as you would from a brmsfit object in R?

I am fairly new to R/STAN and I would like to code my own model in STAN code. The problem is that I don't know how to obtain the estimate__ values that conditional_effects(brmsfit) produces when using library(brms).
Here is an example of what I would like to obtain:
library(rstan)
library(brms)
N <- 10
y <- rnorm(10)
x <- rnorm(10)
df <- data.frame(x, y)
fit <- brm(y ~ x, data = df)
data <- conditional_effects(fit)
print(data[["x"]])
Which gives this output:
x y cond__ effect1__ estimate__ se__
1 -1.777412243 0.1417486 1 -1.777412243 0.08445399 0.5013894
2 -1.747889444 0.1417486 1 -1.747889444 0.08592914 0.4919022
3 -1.718366646 0.1417486 1 -1.718366646 0.08487412 0.4840257
4 -1.688843847 0.1417486 1 -1.688843847 0.08477227 0.4744689
5 -1.659321048 0.1417486 1 -1.659321048 0.08637019 0.4671830
6 -1.629798249 0.1417486 1 -1.629798249 0.08853233 0.4612196
7 -1.600275450 0.1417486 1 -1.600275450 0.08993511 0.4566040
8 -1.570752651 0.1417486 1 -1.570752651 0.08987979 0.4501722
9 -1.541229852 0.1417486 1 -1.541229852 0.09079337 0.4415650
10 -1.511707053 0.1417486 1 -1.511707053 0.09349952 0.4356073
11 -1.482184255 0.1417486 1 -1.482184255 0.09382594 0.4292237
12 -1.452661456 0.1417486 1 -1.452661456 0.09406637 0.4229115
13 -1.423138657 0.1417486 1 -1.423138657 0.09537000 0.4165933
14 -1.393615858 0.1417486 1 -1.393615858 0.09626168 0.4126735
15 -1.364093059 0.1417486 1 -1.364093059 0.09754818 0.4060894
16 -1.334570260 0.1417486 1 -1.334570260 0.09737763 0.3992320
17 -1.305047461 0.1417486 1 -1.305047461 0.09646332 0.3929951
18 -1.275524662 0.1417486 1 -1.275524662 0.09713718 0.3870211
19 -1.246001864 0.1417486 1 -1.246001864 0.09915170 0.3806628
20 -1.216479065 0.1417486 1 -1.216479065 0.10046754 0.3738948
21 -1.186956266 0.1417486 1 -1.186956266 0.10192677 0.3675363
22 -1.157433467 0.1417486 1 -1.157433467 0.10329695 0.3613282
23 -1.127910668 0.1417486 1 -1.127910668 0.10518868 0.3533583
24 -1.098387869 0.1417486 1 -1.098387869 0.10533191 0.3484098
25 -1.068865070 0.1417486 1 -1.068865070 0.10582833 0.3442075
26 -1.039342271 0.1417486 1 -1.039342271 0.10864510 0.3370518
27 -1.009819473 0.1417486 1 -1.009819473 0.10830692 0.3325785
28 -0.980296674 0.1417486 1 -0.980296674 0.11107417 0.3288747
29 -0.950773875 0.1417486 1 -0.950773875 0.11229667 0.3249769
30 -0.921251076 0.1417486 1 -0.921251076 0.11420108 0.3216303
31 -0.891728277 0.1417486 1 -0.891728277 0.11533604 0.3160908
32 -0.862205478 0.1417486 1 -0.862205478 0.11671013 0.3099456
33 -0.832682679 0.1417486 1 -0.832682679 0.11934724 0.3059504
34 -0.803159880 0.1417486 1 -0.803159880 0.12031792 0.3035792
35 -0.773637082 0.1417486 1 -0.773637082 0.12114301 0.2985330
36 -0.744114283 0.1417486 1 -0.744114283 0.12149371 0.2949334
37 -0.714591484 0.1417486 1 -0.714591484 0.12259197 0.2915398
38 -0.685068685 0.1417486 1 -0.685068685 0.12308763 0.2905327
39 -0.655545886 0.1417486 1 -0.655545886 0.12409683 0.2861451
40 -0.626023087 0.1417486 1 -0.626023087 0.12621634 0.2834400
41 -0.596500288 0.1417486 1 -0.596500288 0.12898609 0.2838938
42 -0.566977489 0.1417486 1 -0.566977489 0.12925969 0.2802667
43 -0.537454691 0.1417486 1 -0.537454691 0.13050938 0.2782553
44 -0.507931892 0.1417486 1 -0.507931892 0.12968382 0.2765127
45 -0.478409093 0.1417486 1 -0.478409093 0.13252478 0.2735946
46 -0.448886294 0.1417486 1 -0.448886294 0.13414535 0.2727640
47 -0.419363495 0.1417486 1 -0.419363495 0.13453109 0.2710725
48 -0.389840696 0.1417486 1 -0.389840696 0.13526957 0.2683500
49 -0.360317897 0.1417486 1 -0.360317897 0.13675913 0.2665745
50 -0.330795098 0.1417486 1 -0.330795098 0.13987067 0.2658021
51 -0.301272300 0.1417486 1 -0.301272300 0.14111051 0.2668740
52 -0.271749501 0.1417486 1 -0.271749501 0.14382292 0.2680711
53 -0.242226702 0.1417486 1 -0.242226702 0.14531118 0.2662193
54 -0.212703903 0.1417486 1 -0.212703903 0.14656473 0.2670958
55 -0.183181104 0.1417486 1 -0.183181104 0.14689102 0.2677249
56 -0.153658305 0.1417486 1 -0.153658305 0.14749250 0.2698547
57 -0.124135506 0.1417486 1 -0.124135506 0.14880275 0.2711767
58 -0.094612707 0.1417486 1 -0.094612707 0.15072864 0.2719037
59 -0.065089909 0.1417486 1 -0.065089909 0.15257772 0.2720895
60 -0.035567110 0.1417486 1 -0.035567110 0.15434018 0.2753563
61 -0.006044311 0.1417486 1 -0.006044311 0.15556588 0.2783308
62 0.023478488 0.1417486 1 0.023478488 0.15481341 0.2802336
63 0.053001287 0.1417486 1 0.053001287 0.15349716 0.2833364
64 0.082524086 0.1417486 1 0.082524086 0.15432904 0.2868926
65 0.112046885 0.1417486 1 0.112046885 0.15637411 0.2921039
66 0.141569684 0.1417486 1 0.141569684 0.15793097 0.2979247
67 0.171092482 0.1417486 1 0.171092482 0.15952338 0.3022751
68 0.200615281 0.1417486 1 0.200615281 0.15997047 0.3048768
69 0.230138080 0.1417486 1 0.230138080 0.16327957 0.3087545
70 0.259660879 0.1417486 1 0.259660879 0.16372900 0.3125599
71 0.289183678 0.1417486 1 0.289183678 0.16395417 0.3185642
72 0.318706477 0.1417486 1 0.318706477 0.16414444 0.3240570
73 0.348229276 0.1417486 1 0.348229276 0.16570600 0.3273931
74 0.377752075 0.1417486 1 0.377752075 0.16556032 0.3316680
75 0.407274873 0.1417486 1 0.407274873 0.16815162 0.3391713
76 0.436797672 0.1417486 1 0.436797672 0.16817144 0.3465403
77 0.466320471 0.1417486 1 0.466320471 0.16790241 0.3514764
78 0.495843270 0.1417486 1 0.495843270 0.16941330 0.3590708
79 0.525366069 0.1417486 1 0.525366069 0.17068468 0.3662851
80 0.554888868 0.1417486 1 0.554888868 0.17238535 0.3738123
81 0.584411667 0.1417486 1 0.584411667 0.17358253 0.3796033
82 0.613934466 0.1417486 1 0.613934466 0.17521059 0.3869863
83 0.643457264 0.1417486 1 0.643457264 0.17617046 0.3939509
84 0.672980063 0.1417486 1 0.672980063 0.17710931 0.3967577
85 0.702502862 0.1417486 1 0.702502862 0.17816611 0.4026686
86 0.732025661 0.1417486 1 0.732025661 0.17998354 0.4094216
87 0.761548460 0.1417486 1 0.761548460 0.18085939 0.4165644
88 0.791071259 0.1417486 1 0.791071259 0.18114271 0.4198687
89 0.820594058 0.1417486 1 0.820594058 0.18294576 0.4255245
90 0.850116857 0.1417486 1 0.850116857 0.18446785 0.4333511
91 0.879639655 0.1417486 1 0.879639655 0.18498697 0.4407155
92 0.909162454 0.1417486 1 0.909162454 0.18729221 0.4472631
93 0.938685253 0.1417486 1 0.938685253 0.18952720 0.4529227
94 0.968208052 0.1417486 1 0.968208052 0.19203126 0.4579841
95 0.997730851 0.1417486 1 0.997730851 0.19408999 0.4671136
96 1.027253650 0.1417486 1 1.027253650 0.19551024 0.4751111
97 1.056776449 0.1417486 1 1.056776449 0.19700981 0.4804208
98 1.086299247 0.1417486 1 1.086299247 0.19756573 0.4850098
99 1.115822046 0.1417486 1 1.115822046 0.20044626 0.4915511
100 1.145344845 0.1417486 1 1.145344845 0.20250046 0.4996890
lower__ upper__
1 -1.0567858 1.1982199
2 -1.0438136 1.1831539
3 -1.0228641 1.1707170
4 -1.0072313 1.1596104
5 -0.9864567 1.1438521
6 -0.9689320 1.1282532
7 -0.9505741 1.1173943
8 -0.9357609 1.0983966
9 -0.9230198 1.0859565
10 -0.9104617 1.0757511
11 -0.8874429 1.0631791
12 -0.8687644 1.0467475
13 -0.8513190 1.0348922
14 -0.8290140 1.0236083
15 -0.8126063 1.0166800
16 -0.7975146 1.0011153
17 -0.7869631 0.9873863
18 -0.7760327 0.9721754
19 -0.7551183 0.9585837
20 -0.7427828 0.9479480
21 -0.7269582 0.9405559
22 -0.7072756 0.9284436
23 -0.6975987 0.9161489
24 -0.6884648 0.9040642
25 -0.6684576 0.8923201
26 -0.6535668 0.8811996
27 -0.6517693 0.8714208
28 -0.6394743 0.8652541
29 -0.6235719 0.8542377
30 -0.6127188 0.8433206
31 -0.6017256 0.8346912
32 -0.5845027 0.8192662
33 -0.5701008 0.8098853
34 -0.5596900 0.7982326
35 -0.5473666 0.7980605
36 -0.5340069 0.7908127
37 -0.5239994 0.7826979
38 -0.5124559 0.7811926
39 -0.4986325 0.7786670
40 -0.5044564 0.7745791
41 -0.4940340 0.7699341
42 -0.4871297 0.7698303
43 -0.4808839 0.7678166
44 -0.4790951 0.7662335
45 -0.4711604 0.7576184
46 -0.4690302 0.7577330
47 -0.4675442 0.7567887
48 -0.4673520 0.7554134
49 -0.4649256 0.7499373
50 -0.4600178 0.7494690
51 -0.4500426 0.7500552
52 -0.4475863 0.7505488
53 -0.4437339 0.7513191
54 -0.4429276 0.7564214
55 -0.4427087 0.7578937
56 -0.4451014 0.7613821
57 -0.4418548 0.7706546
58 -0.4377409 0.7787030
59 -0.4397108 0.7882644
60 -0.4462651 0.8026011
61 -0.4538979 0.8069187
62 -0.4542826 0.8163290
63 -0.4557042 0.8285206
64 -0.4572005 0.8335650
65 -0.4638491 0.8413812
66 -0.4681885 0.8539095
67 -0.4775714 0.8633141
68 -0.4888333 0.8698490
69 -0.4952363 0.8791527
70 -0.4975383 0.8833882
71 -0.5088667 0.8863114
72 -0.5197474 0.8951534
73 -0.5316745 0.9085101
74 -0.5409388 0.9207023
75 -0.5572803 0.9282691
76 -0.5643576 0.9357900
77 -0.5751774 0.9517092
78 -0.5855919 0.9625510
79 -0.5995727 0.9781417
80 -0.6115650 0.9946185
81 -0.6198287 1.0071916
82 -0.6297608 1.0208370
83 -0.6447637 1.0357034
84 -0.6511860 1.0506364
85 -0.6659993 1.0608813
86 -0.6794852 1.0702993
87 -0.6893830 1.0801824
88 -0.7040491 1.1026626
89 -0.7183266 1.1196308
90 -0.7387399 1.1401544
91 -0.7541057 1.1561184
92 -0.7608552 1.1701851
93 -0.7783620 1.1855296
94 -0.7920760 1.2014060
95 -0.8063188 1.2157463
96 -0.8224106 1.2307841
97 -0.8377605 1.2484814
98 -0.8530954 1.2580503
99 -0.8684646 1.2731355
100 -0.8840083 1.2891893
Where I can easily plot the estimate__ vs x column to obtain my linear regression.
Now assuming I want to do the same but with my own STAN code using the stan() function:
library(rstan)
N <- 10
y <- rnorm(10)
x <- rnorm(10)
df <- data.frame(x, y)
fit <- stan('stan_test.stan', data = list(y = y, x = x, N = N))
print(fit)
Which yields the output:
Inference for Stan model: stan_test.
4 chains, each with iter=2000; warmup=1000; thin=1;
post-warmup draws per chain=1000, total post-warmup draws=4000.
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
alpha -0.35 0.01 0.43 -1.23 -0.62 -0.35 -0.09 0.50 2185 1
beta -0.26 0.01 0.57 -1.41 -0.60 -0.25 0.08 0.86 2075 1
sigma 1.26 0.01 0.41 0.74 0.99 1.17 1.43 2.27 1824 1
lp__ -6.19 0.04 1.50 -10.18 -6.87 -5.79 -5.07 -4.48 1282 1
Samples were drawn using NUTS(diag_e) at Fri Jun 03 10:08:50 2022.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).
How would I obtain the same estimate__ column as well as the lower__ and upper__ columns?
Note, I know I can easily plot it using the intercept and slope means, but I would like to plot more complex models that can't be plotted as easily as such -- this is just a simple example.
My understanding is that brms estimates conditional effects by applying the model formula to a range of values for the variable you're interested in, with other variables set to appropriate baseline values. In order to do this, brms has to generate the new dataset, apply the model to it, and summarize appropriately. To my knowledge, rstan doesn't have built-in functions that do this; this means that, when we move from brms to rstan, we have to do these steps ourselves.
Here's one way to do it. I've done the first two steps (generate a new dataset and apply the model to it) within Stan, although it would be possible to use R instead.
Generate the new dataset
I've added a transformed data block to the basic Stan program. It finds the min and max observed values of x and creates a vector of 100 evenly spaced points between those two values. If you have more than one predictor for which you want to estimate conditional effects, you'll need to create a separate vector for each one.
data {
int<lower=0> N;
vector[N] x;
vector[N] y;
}
transformed data {
// How many values of the continuous variable will we use to estimate
// conditional effects?
// 100, to match the default behavior of conditional_effects.
int n_cond_points = 100;
vector[n_cond_points] x_cond_internal;
// Space the values evenly between the min and max observed values.
real point_diff = (max(x) - min(x)) / n_cond_points;
for(i in 1:n_cond_points) {
if(i == 1) {
x_cond_internal[i] = min(x);
} else if(i == n_cond_points) {
x_cond_internal[i] = max(x);
} else {
x_cond_internal[i] = x_cond_internal[i - 1] + point_diff;
}
}
}
Apply the model to the new dataset
I used the generated quantities block to apply the model to the new dataset. Three things are noteworthy here:
It's not possible to extract values from the transformed data block out of a stanfit object. As suggested by this answer, I've copied the new dataset into a variable in the generated quantities block so we can get it out of the stanfit object. (It will be the same across all draws.)
We have to specify the model by hand, in the same way it was specified in the model block. If the model changes there, it must be changed by hand in the same way in the generated quantities block.
If you have more than one predictor, you'll need to iterate over each predictor separately. In addition, while you're estimating conditional effects for one predictor, you'll need to choose an appropriate baseline values for the other predictors (0, mean, baseline category, or whatever is appropriate for your dataset).
parameters {
real alpha;
real beta;
real<lower=0> sigma;
}
model {
y ~ normal(alpha + (beta * x), sigma);
}
generated quantities {
// We can't extract transformed data from the stanfit object, so we copy the
// values of x_cond here.
vector[n_cond_points] x_cond = x_cond_internal;
// Estimated value of y for each value of x.
// Note that we have to specify the formula from the model block again; if
// that formula changes, this one must be changed by hand to match.
vector[n_cond_points] y_cond;
for(i in 1:n_cond_points) {
y_cond[i] = alpha + (beta * x_cond[i]);
}
}
Summarize the estimates
When we fit this Stan model, we get one estimate of y_cond per value of x_cond per draw, which is exactly what we want. We can summarize over draws in R:
library(tidyverse)
library(tidybayes)
fit2 <- stan('stan_test.stan', data = list(y = y, x = x, N = N))
cond.effects.df = spread_draws(fit2, x_cond[i], y_cond[i]) %>%
ungroup() %>%
dplyr::select(.draw, i, x = x_cond, y_cond) %>%
group_by(i, x) %>%
summarise(estimate__ = median(y_cond),
lower__ = quantile(y_cond, 0.025),
upper__ = quantile(y_cond, 0.975),
.groups = "keep") %>%
ungroup()
Comparing the two methods
The results of this procedure look pretty much the same as the output of brms. Here's what I got:
theme_set(theme_bw())
bind_rows(
data[["x"]] %>%
mutate(i = row_number(),
method = "brms"),
cond.effects.df %>%
mutate(method = "by hand")
) %>%
ggplot(aes(x = x, color = method, fill = method, group = method)) +
geom_line(aes(y = estimate__)) +
geom_ribbon(aes(ymin = lower__, ymax = upper__), color = NA, alpha = 0.2)

iGraph figures chopped off in R markdown

I have some code that generates layouts for the following minimum spanning tree "cell_dtree":
> cell_dtree
IGRAPH 951dfd5 D--- 720 719 --
+ edges from 951dfd5:
[1] 400-> 1 1-> 2 38-> 3 3-> 4 197-> 5 10-> 6 10-> 7 13-> 8 1-> 9 28-> 10 225-> 11 362-> 12 30-> 13
[14] 20-> 14 148-> 15 3-> 16 13-> 17 160-> 18 435-> 19 1-> 20 60-> 21 38-> 22 9-> 23 68-> 24 9-> 25 178-> 26
[27] 21-> 27 1-> 28 60-> 29 2-> 30 1-> 31 2-> 32 1-> 33 352-> 34 21-> 35 20-> 36 1-> 37 1-> 38 554-> 39
[40] 3-> 40 554-> 41 333-> 42 352-> 43 126-> 44 1-> 45 69-> 46 227-> 47 160-> 48 1-> 49 1-> 50 37-> 51 708-> 52
[53] 705-> 53 185-> 54 307-> 55 48-> 56 667-> 57 563-> 58 428-> 59 519-> 60 428-> 61 707-> 62 1-> 63 707-> 64 707-> 65
[66] 707-> 66 214-> 67 20-> 68 68-> 69 37-> 70 453-> 71 57-> 72 148-> 73 345-> 74 69-> 75 148-> 76 80-> 77 79-> 78
[79] 9-> 79 70-> 80 68-> 81 148-> 82 23-> 83 345-> 84 454-> 85 345-> 86 36-> 87 345-> 88 36-> 89 311-> 90 148-> 91
[92] 13-> 92 345-> 93 13-> 94 350-> 95 326-> 96 79-> 97 666-> 98 539-> 99 430->100 554->101 213->102 20->103 38->104
[105] 21->105 172->106 112->107 1->108 20->109 453->110 80->111 703->112 20->113 9->114 79->115 1->116 47->117
+ ... omitted several edges
Here is the r markdown code to create the plot.
```{r, results='asis', fig.width = 7, fig.heigt = 7}
set.seed(1)
l2 <- igraph::layout.lgl(cell_dtree)
l2 <- igraph::layout.norm(l2, ymin = -2, ymax = 2, xmin = -2, xmax = 2)
plot(cell_dtree,
rescale = F,
layout = l2 * 1,
edge.arrow.width=0.1,
vertex.label=NA,
vertex.size=1,
vertex.label=NA,
edge.width=0.5,
edge.arrow.size=0.5,
edge.arrow.width=0.7)
```
When this document goes through knittr, the plot looks like it has been cut off, only a rectangular region in the center is displayed in the knitr html output.
To troubleshoot the issue, I ran igraph's implementation of Erdos-Renyi graph, using the same number of vertices (720) as the graph above, and the same plotting parameters:
```{r, results='asis', out.width = '100%', out.height = '100%', fig.width = 7, fig.heigt = 7}
er <- igraph::sample_gnm(n=720, m=40)
plot(er, vertex.size=6, vertex.label=NA)
set.seed(1)
l2 <- igraph::layout.lgl(er)
l2 <- igraph::layout.norm(l2, ymin = -2, ymax = 2, xmin = -2, xmax = 2)
plot(er,
rescale = F,
layout = l2 * 1,
vertex.label=NA,
vertex.size=1,
vertex.label=NA,
edge.width=0.5,
edge.arrow.size=0.5,
edge.arrow.width=0.7)
```
However, the resulting image does exactly what I want it to do: fill the figure box in an efficient way without those big white borders. Obviously not all the vertices are displayed here, but the scaling helps to show that the layout is using all available space for the image:

Function with a for loop to create a column with values 1:n conditioned by intervals matched by another column

I have a data frame like the following
my_df=data.frame(x=runif(100, min = 0,max = 60),
y=runif(100, min = 0,max = 60)) #x and y in cm
With this I need a new column with values from 1 to 36 that match x and y every 10 cm. For example, if 0<=x<=10 & 0<=y<=10, put 1, then if 10<=x<=20 & 0<=y<=10, put 2 and so on up to 6, then 0<=x<=10 & 10<=y<=20 starting with 7 up to 12, etc. I tried to make a function with an if repeating the interval for x 6 times, and increasing by 10 the interval for y every iteration. Here is the function
#my miscarried function 'zones'
>zones= function(x,y) {
i=vector(length = 6)
n=vector(length = 6)
z=vector(length = 36)
i[1]=0
z[1]=0
n[1]=1
for (t in 1:6) {
if (0<=x & x<10 & i[t]<=y & y<i[t]+10) { z[t] = n[t]} else
if (10<=x & x<20 & i[t]<=y & y<i[t]+10) {z[t]=n[t]+1} else
if (20<=x & x<30 & i[t]<=y & y<i[t]+10) {z[t]=n[t]+2} else
if (30<=x & x<40 & i[t]<=y & y<i[t]+10) {z[t]=n[t]+3} else
if (40<=x & x<50 & i[t]<=y & y<i[t]+10) {z[t]=n[t]+4}else
if (50<=x & x<=60 & i[t]<=y & y<i[t]+10) {z[t]=n[t]+5}
else {i[t+1]=i[t]+10
n[t+1]=n[t]+6}
}
return(z)
}
>xy$z=zones(x=xy$x,y=xy$y)
and I got
There were 31 warnings (use warnings() to see them)
>xy$z
[1] 0 0 0 0 25 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Please,help me before I die alone!
I think think this does the trick.
a <- cut(my_df$x, (0:6) * 10)
b <- cut(my_df$y, (0:6) * 10)
z <- interaction(a, b)
levels(z)
[1] "(0,10].(0,10]" "(10,20].(0,10]" "(20,30].(0,10]" "(30,40].(0,10]"
[5] "(40,50].(0,10]" "(50,60].(0,10]" "(0,10].(10,20]" "(10,20].(10,20]"
[9] "(20,30].(10,20]" "(30,40].(10,20]" "(40,50].(10,20]" "(50,60].(10,20]"
[13] "(0,10].(20,30]" "(10,20].(20,30]" "(20,30].(20,30]" "(30,40].(20,30]"
[17] "(40,50].(20,30]" "(50,60].(20,30]" "(0,10].(30,40]" "(10,20].(30,40]"
[21] "(20,30].(30,40]" "(30,40].(30,40]" "(40,50].(30,40]" "(50,60].(30,40]"
[25] "(0,10].(40,50]" "(10,20].(40,50]" "(20,30].(40,50]" "(30,40].(40,50]"
[29] "(40,50].(40,50]" "(50,60].(40,50]" "(0,10].(50,60]" "(10,20].(50,60]"
[33] "(20,30].(50,60]" "(30,40].(50,60]" "(40,50].(50,60]" "(50,60].(50,60]"
If this types of levels aren't for your taste, then change as below:
levels(z) <- 1:36
Is this what you're after? The resulting numbers are in column res:
# Get bin index for x values and y values
my_df$bin1 <- as.numeric(cut(my_df$x, breaks = seq(0, max(my_df$x) + 10, by = 10)));
my_df$bin2 <- as.numeric(cut(my_df$y, breaks = seq(0, max(my_df$x) + 10, by = 10)));
# Multiply bin indices
my_df$res <- my_df$bin1 * my_df$bin2;
> head(my_df)
x y bin1 bin2 res
1 49.887499 47.302849 5 5 25
2 43.169773 50.931357 5 6 30
3 10.626466 43.673533 2 5 10
4 43.401454 3.397009 5 1 5
5 7.080386 22.870539 1 3 3
6 39.094724 24.672907 4 3 12
I've broken down the steps for illustration purposes; you probably don't want to keep the intermediate columns bin1 and bin2.
We probably need a table showing the relationship between x, y, and z. After that, we can define a function to do the join.
The solution is related and inspired by this post (R dplyr join by range or virtual column). You may also find other solutions are useful.
# Set seed for reproducibility
set.seed(1)
# Create example data frame
my_df <- data.frame(x=runif(100, min = 0,max = 60),
y=runif(100, min = 0,max = 60))
# Load the dplyr package
library(dplyr)
# Create a table to show the relationship between x, y, and z
r <- expand.grid(x_from = seq(0, 50, 10), y_from = seq(0, 50, 10)) %>%
mutate(x_to = x_from + 10, y_to = y_from + 10, z = 1:n())
# Define a function for dynamic join
dynamic_join <- function(d, r){
if (!("z" %in% colnames(d))){
d[["z"]] <- NA_integer_
}
d <- d %>%
mutate(z = ifelse(x >= r$x_from & x < r$x_to & y >= r$y_from & y < r$y_to,
r$z, z))
return(d)
}
re_dynamic_join <- function(d, r){
r_list <- split(r, r$z)
for (i in 1:length(r_list)){
d <- dynamic_join(d, r_list[[i]])
}
return(d)
}
# Apply the function
re_dynamic_join(my_df, r)
x y z
1 15.930520 39.2834357 20
2 22.327434 21.1918363 15
3 34.371202 16.2156088 10
4 54.492467 59.5610437 36
5 12.100916 38.0095959 20
6 53.903381 12.7924881 12
7 56.680516 7.7623409 6
8 39.647868 28.6870821 16
9 37.746843 55.4444682 34
10 3.707176 35.9256580 19
11 12.358474 58.5702417 32
12 10.593405 43.9075507 26
13 41.221371 21.4036147 17
14 23.046223 25.8884214 15
15 46.190485 8.8926936 5
16 29.861955 0.7846545 3
17 43.057110 42.9339640 29
18 59.514366 6.1910541 6
19 22.802111 26.7770609 15
20 46.646713 38.4060627 23
21 56.082314 59.5103172 36
22 12.728551 29.7356147 14
23 39.100426 29.0609715 16
24 7.533306 10.4065401 7
25 16.033240 45.2892567 26
26 23.166846 27.2337294 15
27 0.803420 30.6701870 19
28 22.943277 12.4527068 9
29 52.181451 13.7194886 12
30 20.420940 35.7427198 21
31 28.924807 34.4923319 21
32 35.973950 4.6238628 4
33 29.612478 2.1324348 3
34 11.173056 38.5677295 20
35 49.642399 55.7169120 35
36 40.108004 35.8855453 23
37 47.654392 33.6540449 23
38 6.476618 31.5616634 19
39 43.422657 59.1057134 35
40 24.676466 30.4585093 21
41 49.256778 40.9672847 29
42 38.823612 36.0924731 22
43 46.975966 14.3321207 11
44 33.182179 15.4899556 10
45 31.783175 43.7585774 28
46 47.361374 27.1542499 17
47 1.399872 10.5076061 7
48 28.633804 44.8018962 27
49 43.938824 6.2992584 5
50 41.563893 51.8726969 35
51 28.657177 36.8786983 21
52 51.672569 33.4295723 24
53 26.285826 19.7266391 9
54 14.687837 27.1878867 14
55 4.240743 30.0264584 19
56 5.967970 10.8519817 7
57 18.976302 31.7778362 20
58 31.118056 4.5165447 4
59 39.720305 16.6653560 10
60 24.409811 12.7619712 9
61 54.772555 17.0874289 12
62 17.616202 53.7056462 32
63 27.543944 26.7741194 15
64 19.943680 46.7990934 26
65 39.052228 52.8371421 34
66 15.481007 24.7874526 14
67 28.712715 3.8285088 3
68 45.978640 20.1292495 17
69 5.054815 43.4235568 25
70 52.519280 20.2569200 18
71 20.344376 37.8248473 21
72 50.366421 50.4368732 36
73 20.801009 51.3678999 33
74 20.026496 23.4815569 15
75 28.581075 22.8296331 15
76 53.531900 53.7267256 36
77 51.860368 38.6589458 24
78 23.399373 44.4647189 27
79 46.639242 36.3182068 23
80 57.637080 54.1848967 36
81 26.079569 17.6238093 9
82 42.750881 11.4756066 11
83 23.999662 53.1870566 33
84 19.521129 30.2003691 20
85 45.425229 52.6234526 35
86 12.161535 11.3516173 8
87 42.667273 45.4861831 29
88 7.301515 43.4699336 25
89 14.729311 56.6234891 32
90 8.598263 32.8587952 19
91 14.377765 42.7046321 26
92 3.536063 23.3343060 13
93 38.537296 6.0523876 4
94 52.576153 55.6381253 36
95 46.734881 16.9939500 11
96 47.838530 35.4343895 23
97 27.316467 6.6216363 3
98 24.605045 50.4304219 33
99 48.652215 19.0778211 11
100 36.295997 46.9710802 28

Aesthetics must be either length 1 or the same as the data: ymin, ymax, x, y, colour, when using a second geom_errorbar function

I'm trying to add error bars to a second curve (using dataset "pmfprofbs01"), but I'm having problems and I couldn't fix this.
There are a few threads on this error, but unfortunately it looks like every other answer is case specific, and I'm not able to overcome this error in my code. I am able to plot a first smoothed curve (stat_smooth) and overlapping errorbars (using geom_errobar). The problem rises when I try to add a second curve to the same graph, for comparison purposes.
With following code, I get the following error: "Error: Aesthetics must be either length 1 or the same as the data (35): ymin, ymax, x, y, colour"
I am looking to add additional errorbars to the second smoothed curve (corresponding to datasets pmfprof01 and pmfprofbs01).
Could someone explain why I keep getting this error? The code works until using the second call of geom_errorbar().
These are my 4 datasets (all used as data frames):
- pmfprof1 and pmfprof01 are the two datasets used for applying the smoothing method.
- pmfprofbs1 and pmfprofbs01 contain additional information based on an error analysis for plotting error bars.
> pmfprof1
Z correctedpmfprof1
1 -1.1023900 -8.025386e-22
2 -1.0570000 6.257110e-02
3 -1.0116000 1.251420e-01
4 -0.9662020 2.143170e-01
5 -0.9208040 3.300960e-01
6 -0.8754060 4.658550e-01
7 -0.8300090 6.113410e-01
8 -0.7846110 4.902430e-01
9 -0.7392140 3.344200e-01
10 -0.6938160 4.002040e-01
11 -0.6484190 1.215460e-01
12 -0.6030210 -1.724360e-01
13 -0.5576240 -6.077170e-01
14 -0.5122260 -1.513420e+00
15 -0.4668290 -2.075330e+00
16 -0.4214310 -2.617160e+00
17 -0.3760340 -3.350500e+00
18 -0.3306360 -4.076220e+00
19 -0.2852380 -4.926540e+00
20 -0.2398410 -5.826390e+00
21 -0.1944430 -6.761300e+00
22 -0.1490460 -7.301530e+00
23 -0.1036480 -7.303880e+00
24 -0.0582507 -7.026800e+00
25 -0.0128532 -6.627960e+00
26 0.0325444 -6.651490e+00
27 0.0779419 -6.919830e+00
28 0.1233390 -6.686490e+00
29 0.1687370 -6.129060e+00
30 0.2141350 -6.120890e+00
31 0.2595320 -6.455160e+00
32 0.3049300 -6.554560e+00
33 0.3503270 -6.983390e+00
34 0.3957250 -7.413500e+00
35 0.4411220 -6.697370e+00
36 0.4865200 -5.477230e+00
37 0.5319170 -4.552890e+00
38 0.5773150 -3.393060e+00
39 0.6227120 -2.449930e+00
40 0.6681100 -2.183190e+00
41 0.7135080 -1.673980e+00
42 0.7589050 -8.003740e-01
43 0.8043030 -2.918780e-01
44 0.8497000 -1.159710e-01
45 0.8950980 9.123767e-22
> pmfprof01
Z correctedpmfprof01
1 -1.25634000 -1.878749e-21
2 -1.20387000 -1.750190e-01
3 -1.15141000 -3.500380e-01
4 -1.09894000 -6.005650e-01
5 -1.04647000 -7.935110e-01
6 -0.99400600 -8.626150e-01
7 -0.94153900 -1.313880e+00
8 -0.88907200 -2.067770e+00
9 -0.83660500 -2.662440e+00
10 -0.78413800 -4.514190e+00
11 -0.73167100 -7.989510e+00
12 -0.67920400 -1.186870e+01
13 -0.62673800 -1.535970e+01
14 -0.57427100 -1.829150e+01
15 -0.52180400 -2.067170e+01
16 -0.46933700 -2.167890e+01
17 -0.41687000 -2.069820e+01
18 -0.36440300 -1.662640e+01
19 -0.31193600 -1.265950e+01
20 -0.25946900 -1.182580e+01
21 -0.20700200 -1.213370e+01
22 -0.15453500 -1.233680e+01
23 -0.10206800 -1.235160e+01
24 -0.04960160 -1.123630e+01
25 0.00286531 -9.086940e+00
26 0.05533220 -6.562710e+00
27 0.10779900 -4.185860e+00
28 0.16026600 -3.087430e+00
29 0.21273300 -2.005150e+00
30 0.26520000 -9.295540e-02
31 0.31766700 1.450360e+00
32 0.37013400 1.123910e+00
33 0.42260100 2.426750e-01
34 0.47506700 1.213370e-01
35 0.52753400 5.265226e-21
> pmfprofbs1
Z correctedpmfprof01 bsmean bssd bsse bsci
1 -1.1023900 -8.025386e-22 0.00000000 0.0000000 0.00000000 0.0000000
2 -1.0570000 6.257110e-02 1.46519200 0.6691245 0.09974719 0.2010273
3 -1.0116000 1.251420e-01 1.62453300 0.6368053 0.09492933 0.1913175
4 -0.9662020 2.143170e-01 1.62111600 0.7200497 0.10733867 0.2163269
5 -0.9208040 3.300960e-01 1.44754700 0.7236743 0.10787900 0.2174158
6 -0.8754060 4.658550e-01 1.67509800 0.7148755 0.10656735 0.2147724
7 -0.8300090 6.113410e-01 1.78144200 0.7374481 0.10993227 0.2215539
8 -0.7846110 4.902430e-01 1.73058700 0.7701354 0.11480501 0.2313743
9 -0.7392140 3.344200e-01 0.97430090 0.7809477 0.11641681 0.2346227
10 -0.6938160 4.002040e-01 1.26812000 0.8033838 0.11976139 0.2413632
11 -0.6484190 1.215460e-01 0.93601510 0.7927926 0.11818254 0.2381813
12 -0.6030210 -1.724360e-01 0.63201080 0.8210839 0.12239996 0.2466809
13 -0.5576240 -6.077170e-01 0.05952252 0.8653050 0.12899205 0.2599664
14 -0.5122260 -1.513420e+00 0.57893690 0.8858471 0.13205429 0.2661379
15 -0.4668290 -2.075330e+00 -0.08164613 0.8921298 0.13299086 0.2680255
16 -0.4214310 -2.617160e+00 -1.08074600 0.8906925 0.13277660 0.2675937
17 -0.3760340 -3.350500e+00 -1.67279700 0.9081813 0.13538367 0.2728479
18 -0.3306360 -4.076220e+00 -2.50074900 1.0641550 0.15863486 0.3197076
19 -0.2852380 -4.926540e+00 -3.12062200 1.0639080 0.15859804 0.3196333
20 -0.2398410 -5.826390e+00 -4.47060100 1.1320770 0.16876008 0.3401136
21 -0.1944430 -6.761300e+00 -5.40812700 1.1471780 0.17101120 0.3446504
22 -0.1490460 -7.301530e+00 -6.42419100 1.1685490 0.17419700 0.3510710
23 -0.1036480 -7.303880e+00 -5.79613500 1.1935850 0.17792915 0.3585926
24 -0.0582507 -7.026800e+00 -5.85496900 1.2117630 0.18063896 0.3640539
25 -0.0128532 -6.627960e+00 -6.70480400 1.1961400 0.17831002 0.3593602
26 0.0325444 -6.651490e+00 -8.27106200 1.3376870 0.19941060 0.4018857
27 0.0779419 -6.919830e+00 -8.79402900 1.3582760 0.20247983 0.4080713
28 0.1233390 -6.686490e+00 -8.35947700 1.3673080 0.20382624 0.4107848
29 0.1687370 -6.129060e+00 -8.04437600 1.3921620 0.20753126 0.4182518
30 0.2141350 -6.120890e+00 -8.18588300 1.5220550 0.22689456 0.4572759
31 0.2595320 -6.455160e+00 -8.37217600 1.5436800 0.23011823 0.4637728
32 0.3049300 -6.554560e+00 -8.59346400 1.6276880 0.24264140 0.4890116
33 0.3503270 -6.983390e+00 -8.88378700 1.6557140 0.24681927 0.4974316
34 0.3957250 -7.413500e+00 -9.72709800 1.6569390 0.24700188 0.4977996
35 0.4411220 -6.697370e+00 -9.46033400 1.6378470 0.24415582 0.4920637
36 0.4865200 -5.477230e+00 -8.37590600 1.6262700 0.24243002 0.4885856
37 0.5319170 -4.552890e+00 -7.52867000 1.6617010 0.24771176 0.4992302
38 0.5773150 -3.393060e+00 -6.89192300 1.6667330 0.24846189 0.5007420
39 0.6227120 -2.449930e+00 -6.25115300 1.6670390 0.24850750 0.5008340
40 0.6681100 -2.183190e+00 -6.05373800 1.6720180 0.24924973 0.5023298
41 0.7135080 -1.673980e+00 -5.10526700 1.6668400 0.24847784 0.5007742
42 0.7589050 -8.003740e-01 -4.42001600 1.6561830 0.24688918 0.4975725
43 0.8043030 -2.918780e-01 -4.26640200 1.6588970 0.24729376 0.4983878
44 0.8497000 -1.159710e-01 -4.46318500 1.6533830 0.24647179 0.4967312
45 0.8950980 9.123767e-22 -5.17173200 1.6557990 0.24683194 0.4974571
> pmfprofbs01
Z correctedpmfprof01 bsmean bssd bsse bsci
1 -1.25634000 -1.878749e-21 0.000000 0.0000000 0.00000000 0.0000000
2 -1.20387000 -1.750190e-01 2.316589 0.4646486 0.07853995 0.1596124
3 -1.15141000 -3.500380e-01 2.320647 0.4619668 0.07808664 0.1586911
4 -1.09894000 -6.005650e-01 2.635883 0.6519826 0.11020517 0.2239639
5 -1.04647000 -7.935110e-01 2.814679 0.6789875 0.11476983 0.2332404
6 -0.99400600 -8.626150e-01 2.588038 0.7324196 0.12380151 0.2515949
7 -0.94153900 -1.313880e+00 2.033736 0.7635401 0.12906183 0.2622852
8 -0.88907200 -2.067770e+00 2.394285 0.8120181 0.13725611 0.2789380
9 -0.83660500 -2.662440e+00 2.465425 0.9485307 0.16033095 0.3258317
10 -0.78413800 -4.514190e+00 0.998115 1.0177400 0.17202946 0.3496059
11 -0.73167100 -7.989510e+00 -1.585430 1.0502190 0.17751941 0.3607628
12 -0.67920400 -1.186870e+01 -5.740894 1.2281430 0.20759406 0.4218819
13 -0.62673800 -1.535970e+01 -9.325951 1.3289330 0.22463068 0.4565045
14 -0.57427100 -1.829150e+01 -12.010540 1.3279860 0.22447060 0.4561792
15 -0.52180400 -2.067170e+01 -14.672770 1.3296720 0.22475559 0.4567583
16 -0.46933700 -2.167890e+01 -14.912250 1.3192610 0.22299581 0.4531820
17 -0.41687000 -2.069820e+01 -12.850570 1.3288470 0.22461614 0.4564749
18 -0.36440300 -1.662640e+01 -6.093746 1.3497100 0.22814263 0.4636416
19 -0.31193600 -1.265950e+01 -5.210692 1.3602240 0.22991982 0.4672533
20 -0.25946900 -1.182580e+01 -6.041660 1.3818700 0.23357866 0.4746890
21 -0.20700200 -1.213370e+01 -5.765808 1.3854680 0.23418683 0.4759249
22 -0.15453500 -1.233680e+01 -6.985883 1.4025360 0.23707185 0.4817880
23 -0.10206800 -1.235160e+01 -7.152865 1.4224030 0.24042999 0.4886125
24 -0.04960160 -1.123630e+01 -3.600538 1.4122650 0.23871635 0.4851300
25 0.00286531 -9.086940e+00 -0.751673 1.5764920 0.26647578 0.5415439
26 0.05533220 -6.562710e+00 2.852910 1.5535620 0.26259991 0.5336672
27 0.10779900 -4.185860e+00 5.398850 1.5915640 0.26902342 0.5467214
28 0.16026600 -3.087430e+00 6.262459 1.6137360 0.27277117 0.5543377
29 0.21273300 -2.005150e+00 8.047920 1.6283340 0.27523868 0.5593523
30 0.26520000 -9.295540e-02 11.168640 1.6267620 0.27497297 0.5588123
31 0.31766700 1.450360e+00 12.345900 1.6363310 0.27659042 0.5620994
32 0.37013400 1.123910e+00 12.124650 1.6289230 0.27533824 0.5595546
33 0.42260100 2.426750e-01 11.279890 1.6137100 0.27276677 0.5543288
34 0.47506700 1.213370e-01 11.531670 1.6311490 0.27571450 0.5603193
35 0.52753400 5.265226e-21 11.284980 1.6662890 0.28165425 0.5723903
The code for plotting both curves is:
deltamean01<-pmfprofbs01[,"bsmean"]-
pmfprofbs01[,"correctedpmfprof01"]
correctmean01<-pmfprofbs01[,"bsmean"]-deltamean01
deltamean1<-pmfprofbs1[,"bsmean"]-
pmfprofbs1[,"correctedpmfprof1"]
correctmean1<-pmfprofbs1[,"bsmean"]-deltamean1
pl<- ggplot(pmfprof1, aes(x=pmfprof1[,1], y=pmfprof1[,2],
colour="red")) +
list(
stat_smooth(method = "gam", formula = y ~ s(x), size = 1,
colour="chartreuse3",fill="chartreuse3", alpha = 0.3),
geom_line(data=pmfprof1,linetype=4, size=0.5,colour="chartreuse3"),
geom_errorbar(aes(ymin=correctmean1-pmfprofbs1[,"bsci"],
ymax=correctmean1+pmfprofbs1[,"bsci"]),
data=pmfprofbs1,colour="chartreuse3",
width=0.02,size=0.9),
geom_point(data=pmfprof1,size=1,colour="chartreuse3"),
xlab(expression(xi*(nm))),
ylab("PMF (KJ/mol)"),
## GCD
geom_errorbar(aes(ymin=correctmean01-pmfprofbs01[,"bsci"],
ymax=correctmean01+pmfprofbs01[,"bsci"]),
data=pmfprofbs01,
width=0.02,size=0.9),
geom_line(data=pmfprof01,aes(x=pmfprof01[,1],y=pmfprof01[,2]),
linetype=4, size=0.5,colour="darkgreen"),
stat_smooth(data=pmfprof01,method = "gam",aes(x=pmfprof01[,1],pmfprof01[,2]),
formula = y ~ s(x), size = 1,
colour="darkgreen",fill="darkgreen", alpha = 0.3),
theme(text = element_text(size=20),
axis.text.x = element_text(size=20,colour="black"),
axis.text.y = element_text(size=20,colour="black")),
scale_x_continuous(breaks=number_ticks(8)),
scale_y_continuous(breaks=number_ticks(8)),
theme(panel.background = element_rect(fill ='white',
colour='gray')),
theme(plot.background = element_rect(fill='white',
colour='white')),
theme(legend.position="none"),
theme(legend.key = element_blank()),
theme(legend.title = element_text(colour='gray', size=20)),
NULL
)
pl
This is the result of using pl,
[enter image description here][1]
[1]: https://i.stack.imgur.com/x8FjY.png
Thanks in advance for any suggestion,

Given data points and y value, give x value

Given a set of (x,y) coordinates, how can I solve for x, from y. If you were to plot the coordinates, they would be non-linear, but pretty close to exponential. I tried approx(), but it is way off. Here is example data. In this scenario, how could I solve for y == 50?
V1 V3
1 5.35 11.7906
2 10.70 15.0451
3 16.05 19.4243
4 21.40 20.7885
5 26.75 22.0584
6 32.10 25.4367
7 37.45 28.6701
8 42.80 30.7500
9 48.15 34.5084
10 53.50 37.0096
11 58.85 39.3423
12 64.20 41.5023
13 69.55 43.4599
14 74.90 44.7299
15 80.25 46.5738
16 85.60 47.7548
17 90.95 49.9749
18 96.30 51.0331
19 101.65 52.0207
20 107.00 52.9781
21 112.35 53.8730
22 117.70 54.2907
23 123.05 56.3025
24 128.40 56.6949
25 133.75 57.0830
26 139.10 58.5051
27 144.45 59.1440
28 149.80 60.0687
29 155.15 60.6627
30 160.50 61.2313
31 165.85 61.7748
32 171.20 62.5587
33 176.55 63.2684
34 181.90 63.7085
35 187.25 64.0788
36 192.60 64.5807
37 197.95 65.2233
38 203.30 65.5331
39 208.65 66.1200
40 214.00 66.6208
41 219.35 67.1952
42 224.70 67.5270
43 230.05 68.0175
44 235.40 68.3869
45 240.75 68.7485
46 246.10 69.1878
47 251.45 69.3980
48 256.80 69.5899
49 262.15 69.7382
50 267.50 69.7693
51 272.85 69.7693
52 278.20 69.7693
53 283.55 69.7693
54 288.90 69.7693
I suppose the problem you have is that approx solves for y given x, while you are talking about solving for x given y. So you need to switch your variables x and y when using approx:
df <- read.table(textConnection("
V1 V3
85.60 47.7548
90.95 49.9749
96.30 51.0331
101.65 52.0207
"), header = TRUE)
approx(x = df$V3, y = df$V1, xout = 50)
# $x
# [1] 50
#
# $y
# [1] 91.0769
Also, if y is exponential with respect to x, then you have a linear relationship between x and log(y), so it makes more sense to use a linear interpolator between x and log(y), then take the exponential to get back to y:
exp(approx(x = df$V3, y = log(df$V1), xout = 50)$y)
# [1] 91.07339

Resources