Nested reshape from wide to long - r

I keep on getting all sort of error messages when trying to reshape an object into long direction. Toy data:
d <- structure(c(0.204, 0.036, 0.015, 0.013, 0.208, 0.037, 0.015,
0.006, 0.186, 0.044, 0.016, 0.023, 0.251, 0.044, 0.02, 0.01,
0.268, 0.04, 0.007, 0.007, 0.208, 0.062, 0.027, 0.036, 0.272,
0.054, 0.006, 0.01, 0.274, 0.05, 0.011, 0.006, 0.28, 0.039, 0.007,
0.019, 1.93, 0.345, 0.087, 0.094, 2.007, 0.341, 0.064, 0.061,
1.733, 0.39, 0.131, 0.201, 0.094, 0.01, 0.004, 0, 0.096, 0.014,
0, 0.001, 0.081, 0.016, 0.002, 0.016, 0.062, 0.007, 0.011, 0.001,
0.07, 0.003, 0.005, 0.002, 0.043, 0.033, 0, 0.007, 0.081, 0.039,
0.007, 0, 0.085, 0.033, 0.008, 0, 0.086, 0.023, 0.007, 0.007,
0.083, 0.015, 0, 0, 0.09, 0.009, 0, 0, 0.049, 0.052, 0, 0.025,
2.779, 0.203, 0.098, 0.016, 2.801, 0.242, 0.135, 0.01, 2.12,
0.466, 0.177, 0.121, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 0, 1,
2, 3, 0, 1, 2, 3, 0, 1, 2, 3), .Dim = c(12L, 11L), .Dimnames = list(
c("0", "1", "2", "3", "0", "1", "2", "3", "0", "1", "2",
"3"), c("age_77", "age_78", "age_79", "age_80", "age_81",
"age_82", "age_83", "age_84", "age_85", "item", "k")))
Basically I have different ages, for which 3 items have been reported with four response categories each. I would like to obtain a long-shaped object with colnames = age, item, k, proportion, like this:
structure(c(77, 77, 77, 77, 77, 77, 77, 77, 77, 77, 77, 77, 78,
78, 78, 78, 78, 78, 78, 78, 78, 78, 78, 78, 1, 1, 1, 1, 2, 2,
2, 2, 3, 3, 3, 3, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3, 0, 1, 2,
3, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3,
0.204, 0.036, 0.015, 0.013, 0.208, 0.037, 0.015, 0.006, 0.186,
0.044, 0.016, 0.023, 0.251, 0.044, 0.02, 0.01, 0.268, 0.04, 0.007,
0.007, 0.208, 0.062, 0.027, 0.036), .Dim = c(24L, 4L), .Dimnames = list(
c("0", "1", "2", "3", "0", "1", "2", "3", "0", "1", "2",
"3", "0", "1", "2", "3", "0", "1", "2", "3", "0", "1", "2",
"3"), c("age", "item", "k", "proportion")))
An example I tried:
reshape(as.data.frame(d), varying =1:9, sep = "_", direction = "long",
times = "k", idvar = "item")
Error in `row.names<-.data.frame`(`*tmp*`, value = paste(ids, times[i], :
duplicate 'row.names' are not allowed
Any clue where's my mistake? Thanks a lot beforehand!

The object d as provided by the OP is not a data.frame but a matrix which is causing the error:
str(d)
num [1:12, 1:11] 0.204 0.036 0.015 0.013 0.208 0.037 0.015 0.006 0.186 0.044 ...
- attr(*, "dimnames")=List of 2
..$ : chr [1:12] "0" "1" "2" "3" ...
..$ : chr [1:11] "age_77" "age_78" "age_79" "age_80" ...
In addition, the row numbers are not unique which causes an error as well when coercing d to data.frame.
With data.table, d can be coerced to a data.table object and reshaped from wide to long format using melt(). Finally, age is extracted from the column names and stored as integer values as requested by the OP.
library(data.table)
melt(as.data.table(d), measure.vars = patterns("^age_"),
variable.name = "age", value.name = "proportion")[
, age := as.integer(stringr::str_replace(age, "age_", ""))][]
item k age proportion
1: 1 0 77 0.204
2: 1 1 77 0.036
3: 1 2 77 0.015
4: 1 3 77 0.013
5: 2 0 77 0.208
---
104: 2 3 85 0.010
105: 3 0 85 2.120
106: 3 1 85 0.466
107: 3 2 85 0.177
108: 3 3 85 0.121

Related

geom_smooth not working for trendline, too few points?

I am trying to get a trendline for my two sets of averages, in my main graph I will be putting error bars on the points to show the sd's but below is a simplified version:
ggplot(sl, aes(x=Stresslevel, y=Final, color=Treatment)) +
geom_point() +
geom_smooth(method = "lm")
In my output I can see in the legend that it is trying to add it, but it is not showing on the graph:
enter image description here
Here is an image of the data:
enter image description here
Edit: Here is my data, thank you for the advice for getting it>
dput(sl)
structure(list(Stresslevel = structure(c(1L, 2L, 3L, 4L, 5L,
6L, 7L, 3L, 4L, 5L), .Label = c("0", "1", "2 (30%)", "3 (50%)",
"4 (70%)", "5", "Recovered"), class = "factor"), WL = c(0, 15.5,
32.8, 52.9, 69.8, 89.2, 13.5, 30, 50, 70), WLsd = c(5, 6.5, 8.1,
8.8, 10.6, 4.2, 9.8, 5, 5, 5), Final = c(0.0292, 0.0276, 0.0263,
0.0248, 0.0208, 0.0199, 0.0249, 0.0274, 0.0235, 0.0121), Treatment = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L), .Label = c("Stressed", "Treated"
), class = "factor"), Finalsd = c(0.0039, 0.0019, 0.0026, 0.0033,
0.002, 0.0021, 0.0028, 0.0049, 0.0048, 0.0026), Dry = c(0.006,
0.008, 0.0107, 0.0139, 0.0138, 0.0174, 0.0047, 0.008, 0.0116,
0.0105), Drysd = c(0.0015, 0.0015, 0.0017, 0.0024, 0.0011, 0.0022,
0.001, 0.0016, 0.0033, 0.0021), Delta = c(0.0231, 0.0196, 0.0155,
0.0109, 0.007, 0.0025, 0.0201, 0.0194, 0.012, 0.0016), Deltasd = c(0.0034,
0.0015, 0.0019, 0.002, 0.0024, 0.001, 0.0025, 0.0043, 0.0035,
0.0013), WC = c(4.07, 2.54, 1.48, 0.81, 0.52, 0.15, 4.44, 2.48,
1.11, 0.16), WCsd = c(1.22, 0.59, 0.26, 0.21, 0.2, 0.08, 1.06,
0.56, 0.45, 0.12), CD = c(1, 1.33, 1.78, 2.31, 2.29, 2.89, 0.78,
1.33, 1.92, 1.75), CDsd = c(0.24, 0.25, 0.28, 0.4, 0.19, 0.37,
0.16, 0.26, 0.54, 0.35)), class = "data.frame", row.names = c(NA,
-10L))
Any help would be greatly appreciated.
Your x variable is a factor, meaning it is a categorical variable, so it's not clear how to fit a regression line through that:
str(sl)
'data.frame': 10 obs. of 14 variables:
$ Stresslevel: Factor w/ 7 levels "0","1","2 (30%)",..: 1 2 3 4 5 6 7 3 4 5
$ WL : num 0 15.5 32.8 52.9 69.8 89.2 13.5 30 50 70
I am not sure if it makes sense to convert your categories to numeric, that is stresslevel 0 will be 1, stresslevel 1 be 2 etc.. and force a line:
ggplot(sl, aes(x=Stresslevel, y=Final, color=Treatment)) +
geom_point() +
geom_smooth(aes(x=as.numeric(Stresslevel)),method = "lm",se=FALSE)
I would say it might make sense to connect the lines, if it makes sense to look at the progression of your dependent variable from 0 to 5 stress:
ggplot(sl, aes(x=Stresslevel, y=Final, color=Treatment)) +
geom_point() +
geom_line(aes(x=as.numeric(Stresslevel)),linetype="dashed")

Trying to downsize a dataframe by increasing the timestep in R

I have a column which records time in milliseconds starting at 0 and uniformly increasing by .001. I would like to downsize my dataframe by creating a new dataframe that only records the rows that occur every ten milliseconds.
My problem is that the data is in long format and not all participants took the same amount of time to complete the task, so I cannot just take every 10th row.
To try and clarify, this means that whenever there is 0.000 in the time column, I would like to record this point in the new dataframe and then restart the process of taking every tenth millisecond. So far I have tried using "filter" and "subset" with no success.
This is a small example of the data I have:
ID
Time
X
Y
1
0.000
1
5
1
0.001
2
10
1
0.002
3
15
1
0.003
4
20
1
0.004 (on so on... until 0.052)
...
...
1
0.053
10
25
2
0.000
30
30
2
0.001
35
35
2
0.002 (on so on...until 0.036)
...
...
2
0.037
50
55
3
0.000
55
50
And this is what I would like:
ID
Time
X
Y
1
0.000
1
5
1
0.010
30
40
1
0.020
35
45
1
0.030
30
40
1
0.040
33
44
1
0.050
60
100
2
0.000
30
30
2
0.010
40
40
2
0.020
50
50
2
0.030
60
60
3
0.000
55
50
You can try subset + ave + duplicated like below
subset(
df,
!ave(Time, ID, FUN = function(x) duplicated(ceiling(seq_along(x) / 10)))
)
which gives
ID Time
1 1 0.00
11 1 0.01
21 1 0.02
31 1 0.03
41 1 0.04
51 1 0.05
55 2 0.00
65 2 0.01
75 2 0.02
85 2 0.03
92 3 0.00
Data
> dput(df)
structure(list(ID = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 3), Time = c(0,
0.001, 0.002, 0.003, 0.004, 0.005, 0.006, 0.007, 0.008, 0.009,
0.01, 0.011, 0.012, 0.013, 0.014, 0.015, 0.016, 0.017, 0.018,
0.019, 0.02, 0.021, 0.022, 0.023, 0.024, 0.025, 0.026, 0.027,
0.028, 0.029, 0.03, 0.031, 0.032, 0.033, 0.034, 0.035, 0.036,
0.037, 0.038, 0.039, 0.04, 0.041, 0.042, 0.043, 0.044, 0.045,
0.046, 0.047, 0.048, 0.049, 0.05, 0.051, 0.052, 0.053, 0, 0.001,
0.002, 0.003, 0.004, 0.005, 0.006, 0.007, 0.008, 0.009, 0.01,
0.011, 0.012, 0.013, 0.014, 0.015, 0.016, 0.017, 0.018, 0.019,
0.02, 0.021, 0.022, 0.023, 0.024, 0.025, 0.026, 0.027, 0.028,
0.029, 0.03, 0.031, 0.032, 0.033, 0.034, 0.035, 0.036, 0)), row.names = c(NA,
-92L), class = "data.frame")

Reorder row names of a matrices in a list and replace NaN and zeros with ones

I have a list of matrices and I need to order their row names. I have 7 letter rating categories but the row names are a combination of two ratings separated by a hyphen. I would like the row names to be sorted according to the rating before the hyphen. After solving this problem, I'd like to convert the NaN values and 0 values to ones since I have to take the log of every element in the matrices. However, when I replace the NaN with 1 and then proceed to replace the 0 values with 1 the NaN values reappear again.
The object row.order contains the order I would like to follow.
row.order <- c("Aaa", "Aa", "A", "Baa", "Ba", "B", "Caa")
The dput of the list of matrices:
dput(phij.list)
list(structure(c(0.375, 0.268292682926829, 0.384615384615385,
NaN, NaN, 0.222222222222222, NaN, 0.4375, 0.51219512195122, 0.282051282051282,
NaN, NaN, 0.444444444444444, NaN, 0.0625, 0.195121951219512,
0.230769230769231, NaN, NaN, 0.333333333333333, NaN, 0.125, 0.024390243902439,
0.0769230769230769, NaN, NaN, 0, NaN, 0, 0, 0.0256410256410256,
NaN, NaN, 0, NaN, 0, 0, 0, NaN, NaN, 0, NaN, 0, 0, 0, NaN, NaN,
0, NaN), .Dim = c(7L, 7L), .Dimnames = list(hi = c("A-Aaa", "Aa-Aaa",
"Aaa-Aaa", "B-Aaa", "Ba-Aaa", "Baa-Aaa", "Caa-Aaa"), j = c("Aaa",
"Aa", "A", "Baa", "Ba", "B", "Caa"))), structure(c(0.0425531914893617,
0.0641509433962264, 0.27906976744186, 0.0714285714285714, 0,
0.0625, 0, 0.425531914893617, 0.532075471698113, 0.418604651162791,
0.428571428571429, 0.551724137931034, 0.453125, 0, 0.304964539007092,
0.211320754716981, 0.162790697674419, 0.214285714285714, 0.275862068965517,
0.25, 0, 0.113475177304965, 0.132075471698113, 0.116279069767442,
0.142857142857143, 0.137931034482759, 0.140625, 0, 0.0921985815602837,
0.0452830188679245, 0.0232558139534884, 0.142857142857143, 0,
0.0625, 1, 0.0212765957446809, 0.0150943396226415, 0, 0, 0.0344827586206897,
0.03125, 0, 0, 0, 0, 0, 0, 0, 0), .Dim = c(7L, 7L), .Dimnames = list(
hi = c("A-Aa", "Aa-Aa", "Aaa-Aa", "B-Aa", "Ba-Aa", "Baa-Aa",
"Caa-Aa"), j = c("Aaa", "Aa", "A", "Baa", "Ba", "B", "Caa"
))), structure(c(0.00769230769230769, 0.0775193798449612,
0.0869565217391304, 0, 0, 0.00671140939597315, 0, 0.188461538461538,
0.317829457364341, 0.173913043478261, 0.296296296296296, 0.037037037037037,
0.23489932885906, 0.5, 0.496153846153846, 0.341085271317829,
0.478260869565217, 0.333333333333333, 0.462962962962963, 0.342281879194631,
0, 0.207692307692308, 0.193798449612403, 0.260869565217391, 0.222222222222222,
0.333333333333333, 0.281879194630872, 0.5, 0.0884615384615385,
0.062015503875969, 0, 0.111111111111111, 0.111111111111111, 0.087248322147651,
0, 0.00384615384615385, 0.00775193798449612, 0, 0.037037037037037,
0.0555555555555556, 0.0402684563758389, 0, 0.00769230769230769,
0, 0, 0, 0, 0.00671140939597315, 0), .Dim = c(7L, 7L), .Dimnames = list(
hi = c("A-A", "Aa-A", "Aaa-A", "B-A", "Ba-A", "Baa-A", "Caa-A"
), j = c("Aaa", "Aa", "A", "Baa", "Ba", "B", "Caa"))), structure(c(0.0196078431372549,
0.0434782608695652, 0.166666666666667, 0, 0, 0.0116959064327485,
0, 0.163398692810458, 0.159420289855072, 0.666666666666667, 0.0571428571428571,
0.0648148148148148, 0.0994152046783626, 0, 0.300653594771242,
0.347826086956522, 0.166666666666667, 0.285714285714286, 0.222222222222222,
0.251461988304094, 0.333333333333333, 0.274509803921569, 0.260869565217391,
0, 0.314285714285714, 0.37037037037037, 0.350877192982456, 0.333333333333333,
0.163398692810458, 0.130434782608696, 0, 0.228571428571429, 0.194444444444444,
0.233918128654971, 0.333333333333333, 0.065359477124183, 0.0579710144927536,
0, 0.114285714285714, 0.12037037037037, 0.0526315789473684, 0,
0.0130718954248366, 0, 0, 0, 0.0277777777777778, 0, 0), .Dim = c(7L,
7L), .Dimnames = list(hi = c("A-Baa", "Aa-Baa", "Aaa-Baa", "B-Baa",
"Ba-Baa", "Baa-Baa", "Caa-Baa"), j = c("Aaa", "Aa", "A", "Baa",
"Ba", "B", "Caa"))), structure(c(0, 0, 0, 0, 0, 0, 0, 0.150943396226415,
0.212121212121212, 1, 0.02, 0.0285714285714286, 0.0925925925925926,
0, 0.264150943396226, 0.272727272727273, 0, 0.06, 0.104761904761905,
0.138888888888889, 0.214285714285714, 0.415094339622642, 0.212121212121212,
0, 0.12, 0.238095238095238, 0.333333333333333, 0.0714285714285714,
0.0754716981132075, 0.272727272727273, 0, 0.4, 0.333333333333333,
0.305555555555556, 0.214285714285714, 0.0754716981132075, 0,
0, 0.36, 0.247619047619048, 0.101851851851852, 0.357142857142857,
0.0188679245283019, 0.0303030303030303, 0, 0.04, 0.0476190476190476,
0.0277777777777778, 0.142857142857143), .Dim = c(7L, 7L), .Dimnames = list(
hi = c("A-Ba", "Aa-Ba", "Aaa-Ba", "B-Ba", "Ba-Ba", "Baa-Ba",
"Caa-Ba"), j = c("Aaa", "Aa", "A", "Baa", "Ba", "B", "Caa"
))), structure(c(0, 0, NaN, 0, 0, 0, 0, 0, 0.2, NaN, 0.0508474576271186,
0.0476190476190476, 0.128205128205128, 0.0476190476190476, 0.25,
0.2, NaN, 0.101694915254237, 0.142857142857143, 0.179487179487179,
0, 0.333333333333333, 0.4, NaN, 0.0677966101694915, 0.174603174603175,
0.230769230769231, 0.0952380952380952, 0.25, 0.2, NaN, 0.271186440677966,
0.238095238095238, 0.256410256410256, 0.19047619047619, 0.166666666666667,
0, NaN, 0.355932203389831, 0.285714285714286, 0.153846153846154,
0.523809523809524, 0, 0, NaN, 0.152542372881356, 0.111111111111111,
0.0512820512820513, 0.142857142857143), .Dim = c(7L, 7L), .Dimnames = list(
hi = c("A-B", "Aa-B", "Aaa-B", "B-B", "Ba-B", "Baa-B", "Caa-B"
), j = c("Aaa", "Aa", "A", "Baa", "Ba", "B", "Caa"))), structure(c(0,
NaN, NaN, 0, 0, 0, 0, 0, NaN, NaN, 0, 0, 0.142857142857143, 0,
0, NaN, NaN, 0, 0.142857142857143, 0, 0, 0.333333333333333, NaN,
NaN, 0.0526315789473684, 0.214285714285714, 0, 0.0666666666666667,
0.666666666666667, NaN, NaN, 0.263157894736842, 0.142857142857143,
0.428571428571429, 0.0666666666666667, 0, NaN, NaN, 0.473684210526316,
0.214285714285714, 0.285714285714286, 0.466666666666667, 0, NaN,
NaN, 0.210526315789474, 0.285714285714286, 0.142857142857143,
0.4), .Dim = c(7L, 7L), .Dimnames = list(hi = c("A-Caa", "Aa-Caa",
"Aaa-Caa", "B-Caa", "Ba-Caa", "Baa-Caa", "Caa-Caa"), j = c("Aaa",
"Aa", "A", "Baa", "Ba", "B", "Caa"))))
The code I'm using to change NaN to 1:
lapply(phij.list, function(x) replace(x, !is.finite(x), 1))
The code I'm using to change the 0 values to 1
lapply(phij.list, function(x) replace(x, x==0, 1))
You can use sub to remove text after "-" in rownames and then match them with row.order to get the correct order. We can then replace NaN and 0 values with 1.
new_list <- lapply(phij.list, function(x) {
temp <- x[match(sub('-.*', '', rownames(x)), row.order), ]
replace(temp, is.nan(temp) | temp == 0, 1)
})
#[[1]]
# j
#hi Aaa Aa A Baa Ba B Caa
# Aaa-Aaa 0.385 0.282 0.2308 0.0769 0.0256 1 1
# Aa-Aaa 0.268 0.512 0.1951 0.0244 1.0000 1 1
# A-Aaa 0.375 0.438 0.0625 0.1250 1.0000 1 1
# Baa-Aaa 0.222 0.444 0.3333 1.0000 1.0000 1 1
# Ba-Aaa 1.000 1.000 1.0000 1.0000 1.0000 1 1
# B-Aaa 1.000 1.000 1.0000 1.0000 1.0000 1 1
# Caa-Aaa 1.000 1.000 1.0000 1.0000 1.0000 1 1
#[[2]]
# j
#hi Aaa Aa A Baa Ba B Caa
# Aaa-Aa 0.2791 0.419 0.163 0.116 0.0233 1.0000 1
# Aa-Aa 0.0642 0.532 0.211 0.132 0.0453 0.0151 1
# A-Aa 0.0426 0.426 0.305 0.113 0.0922 0.0213 1
# Baa-Aa 0.0625 0.453 0.250 0.141 0.0625 0.0312 1
# Ba-Aa 1.0000 0.552 0.276 0.138 1.0000 0.0345 1
# B-Aa 0.0714 0.429 0.214 0.143 0.1429 1.0000 1
# Caa-Aa 1.0000 1.000 1.000 1.000 1.0000 1.0000 1
#...
#...

Divide each column of a dataframe by one row of the dataframe

I would like to divide each column of my dataframe by the values of one row.
I tried to transform my dataframe into a matrix and to extract one row of the dataframe as a vector then divide the matrix by the vector but it did not work. Indeed, only the first row of the matrix got divided by the vector.
Here is my original dataframe.
And this is the code I tried to run :
data <- read_excel("Documents/TFB/xlsx_geochimie/solfatara_maj.xlsx")
View(data)
data.mat <- as.matrix(data[,2:20])
vector <- data[12,2:20]
data.mat/vector
We replicate the vector to make the length same and then do the division
data.mat/unlist(vector)[col(data.mat)]
# FeO Total S SO4 Total N SiO2 Al2O3 Fe2O3 MnO MgO CaO Na2O K2O
#[1,] 0.10 16.5555556 NA NA 0.8908607 0.8987269 0.1835206 0.08333333 0.03680982 0.04175365 0.04823151 0.5738562
#[2,] 0.40 125.8333333 NA NA 0.5510204 0.4456019 0.2359551 0.08333333 0.04294479 0.01878914 0.04501608 0.2588235
#[3,] 0.85 0.6111111 NA NA 1.0021295 1.0162037 0.7715356 1.08333333 0.53987730 0.69728601 1.03858521 1.0457516
#[4,] 0.15 48.0555556 NA NA 1.1027507 0.2569444 NA 0.08333333 0.01840491 0.01878914 0.04180064 0.1647059
#[5,] 0.85 NA NA NA 1.0889086 1.0271991 0.6591760 0.75000000 0.59509202 0.53862213 1.02250804 1.1228758
#[6,] NA NA NA NA 1.3426797 0.6319444 0.0411985 0.08333333 0.03067485 0.11899791 0.65594855 0.7764706
# TiO2 P2O5 LOI LOI2 Total Total 2 Fe2O3(T)
#[1,] 0.7924528 0.3928571 7.0841837 6.6963855 0.9922233 0.9894632 0.14489796
#[2,] 0.5094340 0.3214286 14.5561224 13.7710843 0.9958126 0.9936382 0.31020408
#[3,] 0.8679245 0.6428571 1.5637755 1.5228916 0.9990030 0.9970179 0.80612245
#[4,] 1.4905660 0.2857143 7.4056122 7.0024096 0.9795613 0.9769384 0.05510204
#[5,] 1.0377358 0.2500000 0.3520408 0.3783133 0.9969093 0.9960239 0.74489796
#[6,] 0.3018868 0.2500000 1.2551020 1.1879518 1.0019940 1.0000000 0.04489796
Or use sweep
sweep(data.mat, MARGIN = 2, unlist(vector), FUN = `/`)
Or using mapply with asplit
mapply(`/`, asplit(data.mat, 2), vector)
data
data_mat <- structure(c(0.2, 0.8, 1.7, 0.3, 1.7, NA, 5.96, 45.3, 0.22, 17.3,
NA, NA, NA, 6.72, NA, 4.08, 0.06, 0.16, NA, NA, NA, NA, NA, NA,
50.2, 31.05, 56.47, 62.14, 61.36, 75.66, 15.53, 7.7, 17.56, 4.44,
17.75, 10.92, 0.49, 0.63, 2.06, NA, 1.76, 0.11, 0.01, 0.01, 0.13,
0.01, 0.09, 0.01, 0.06, 0.07, 0.88, 0.03, 0.97, 0.05, 0.2, 0.09,
3.34, 0.09, 2.58, 0.57, 0.15, 0.14, 3.23, 0.13, 3.18, 2.04, 4.39,
1.98, 8, 1.26, 8.59, 5.94, 0.42, 0.27, 0.46, 0.79, 0.55, 0.16,
0.11, 0.09, 0.18, 0.08, 0.07, 0.07, 27.77, 57.06, 6.13, 29.03,
1.38, 4.92, 27.79, 57.15, 6.32, 29.06, 1.57, 4.93, 99.52, 99.88,
100.2, 98.25, 99.99, 100.5, 99.54, 99.96, 100.3, 98.28, 100.2,
100.6, 0.71, 1.52, 3.95, 0.27, 3.65, 0.22), .Dim = c(6L, 19L), .Dimnames = list(
NULL, c("FeO", "Total S", "SO4", "Total N", "SiO2", "Al2O3",
"Fe2O3", "MnO", "MgO", "CaO", "Na2O", "K2O", "TiO2", "P2O5",
"LOI", "LOI2", "Total", "Total 2", "Fe2O3(T)")))
vector <- structure(list(FeO = 2, `Total S` = 0.36, SO4 = NA_real_, `Total N` = NA_real_,
SiO2 = 56.35, Al2O3 = 17.28, Fe2O3 = 2.67, MnO = 0.12, MgO = 1.63,
CaO = 4.79, Na2O = 3.11, K2O = 7.65, TiO2 = 0.53, P2O5 = 0.28,
LOI = 3.92, LOI2 = 4.15, Total = 100.3, `Total 2` = 100.6,
`Fe2O3(T)` = 4.9), row.names = c(NA, -1L), class = c("tbl_df",
"tbl", "data.frame"))
To divide data frame, df, by the third row:
df/df[rep(3, nrow(df)), ]

Print data.frame structure as character

I have a data.frame that looks like this:
df <- data.frame(
y = c(0.348, 0.099, 0.041, 0.022, 0.015, 0.010, 0.007, 0.005, 0.004, 0.003),
x = c(458, 648, 694, 724, 756, 790, 818, 836, 848, 876))
When I print the data.frame I (obviously) get this output:
df
# y x
# 1 0.348 458
# 2 0.099 648
# 3 0.041 694
# 4 0.022 724
# 5 0.015 756
# 6 0.010 790
# 7 0.007 818
# 8 0.005 836
# 9 0.004 848
# 10 0.003 876
Is there any function where I can print the data.frame as a character string (or similar)?
magic_function(df)
# output
"df <- data.frame(
y = c(0.348, 0.099, 0.041, 0.022, 0.015, 0.010, 0.007, 0.005, 0.004, 0.003),
x = c(458, 648, 694, 724, 756, 790, 818, 836, 848, 876))"
I literally want to print out something like "df <- data.frame(x = c(...), y = (...))" so that I can copy the output and paste it to a stackoverflow question (for reproducibility)!
I just had to do this recently. deparse will do the trick, and you can paste the multi-line output into a single string with collapse:
df.as.char <- paste(deparse(df), collapse = "")
df.as.char
# [1] "structure(list(y = c(0.348, 0.099, 0.041, 0.022, 0.015, 0.01, 0.007, 0.005, 0.004, 0.003), x = c(458, 648, 694, 724, 756, 790, 818, 836, 848, 876)), .Names = c(\"y\", \"x\"), row.names = c(NA, -10L), class = \"data.frame\")"
Depending on the size of your object, you might consider using the width.cutoff argument to deparse (which will reduce the number of lines created by deparse).
If you've got the same thing in mind that I did, then you can assign this through:
df.from.char <- eval(parse(text = df.as.char))
df.from.char
# y x
# 1 0.348 458
# 2 0.099 648
# 3 0.041 694
# 4 0.022 724
# 5 0.015 756
# 6 0.010 790
# 7 0.007 818
# 8 0.005 836
# 9 0.004 848
# 10 0.003 876
identical(df.from.char, df)
# [1] TRUE
And if you really need the assignment arrow to be part of the character, just paste0 that in.
one option is to use:
dput(df)
returns:
structure(list(y = c(0.348, 0.099, 0.041, 0.022, 0.015, 0.01,
0.007, 0.005, 0.004, 0.003), x = c(458, 648, 694, 724, 756, 790,
818, 836, 848, 876)), .Names = c("y", "x"), row.names = c(NA,
-10L), class = "data.frame")
I think I got something!
df4so <- function(df) {
# collapse dput
# shout out to KonradRudolph, Roland and MichaelChirico
a <- paste(capture.output(dput(df)), collapse = "")
# remove structure junk
b <- gsub("structure\\(list\\(", "", a)
# remove everything after names
c <- gsub("\\.Names\\s.*","",b)
# remove trailing whitespace
d <- gsub("\\,\\s+$", "", c)
# put it all together
e <- paste0('df <- data.frame(', d)
# return
print(e)
}
df4so(df)
Output:
[1] "df <- data.frame(y = c(0.348, 0.099, 0.041, 0.022, 0.015, 0.01, 0.007, 0.005, 0.004, 0.003), x = c(458, 648, 694, 724, 756, 790, 818, 836, 848, 876))"
Suitable for copying and pasting to stackoverflow!

Resources