Need some advise to use reshape2:melt function for ggplot-ing - r

I need help writing a command to do the following:
I have two data frames (which I plan to combine into one to do some ggplotting) of the following form:
df1
|..D..|..A...|..B...|
| d1 | a11 | b11 |
| d2 | a12 | b12 |
| d3 | a13 | b13 |
df2
|..D.|..A....|..B....|
| d1 | a21 | b21 |
| d2 | a22 | b22 |
| d3 | a23 | b23 |
The values in the "D" column are the same for both tables, and the variables A and B have the same name, but the values are different. I need to get an output table of the following form:
df3
|..D..|..A...|..B...|Class|
| d1 | a11 | b11 | df1 |
| d2 | a12 | b12 | df1 |
| d3 | a13 | b13 | df1 |
| d1 | a21 | b21 | df2 |
| d2 | a22 | b22 | df2 |
| d3 | a23 | b23 | df2 |
I could just rbind both tables but I know (I think) that this can also be done with the "melt" function, but have not been able to make it happen.

reshape is more or less deprecated... If you want a tidyverse solution, you can do:
library(dplyr)
df3 <- row_binds(df1 = df1, df2 = df2, .id = "class")

Just use cbind then rbind. Make use of R's recycling ability.
df1 <- cbind(mtcars,Class="df1")
df2 <- cbind(mtcars,Class="df2")
rbind(df1,df2)

Related

total() in tab_cols only sum up to one, any suggestion?

Suppose I have dataframe 'y'
WR<-c("S",'J',"T")
B<-c("b1","b2","b3")
wgt<-c(0.3,2,3)
y<-data.frame(WR,B,wgt)
I want to make column percentage crosstab with B as row, WR, and total of WR as columns using expss function
library(expss)
y %>% tab_cols(total(),WR) %>% # Columns
tab_stat_valid_n("Base") %>%
tab_weight(wgt) %>%
tab_stat_valid_n("Projection") %>%
tab_cells(mrset(B))%>% # Row
tab_stat_cpct(total_row_position = "none") %>%
tab_pivot()
Result
But the total Base column does not match up
# #Total WR|J WR|S WR|T
# Base 1.000000 1 1.0 1
# Projection 5.300000 2 0.3 3
# b1 5.660377 NA 100.0 NA
# b2 37.735849 100 NA NA
# b3 56.603774 NA NA 100
I think I found the solution
y %>% tab_cols(total(),WR) %>% # Columns
tab_cells(mrset(B))%>% # Row
tab_stat_valid_n("Base") %>%
tab_weight(wgt) %>%
tab_stat_valid_n("Projection") %>%
tab_stat_cpct(total_row_position = "none") %>%
tab_pivot()
| | | #Total | WR | | |
| | | | J | S | T |
| -- | ---------- | ------ | --- | ----- | --- |
| B | Base | 3.0 | 1 | 1.0 | 1 |
| | Projection | 5.3 | 2 | 0.3 | 3 |
| b1 | | 5.7 | | 100.0 | |
| b2 | | 37.7 | 100 | | |
| b3 | | 56.6 | | | 100 |

density plot for different groups in the same dataframe

I have a df with IDs and values and I would like to generate a density plot for every unique ID and check about the distributions if its normal or skewed.There are also NA values and i am not sure how to treat them. Should i just remove them and create the density plot? Also the range of the values between the IDs is different.
| ID | Values |
| -------- | ------- |
| F1 | 45 |
| F1 | 56 |
| F1 | NA |
| F1 | 68 |
| F1 | 55 |
| F2 | 23 |
| F2 | 44 |
| F2 | 34 |
| F2 | NA |
| F2 | NA |
| F2 | 34 |
| F3 | 5055 |
| F3 | 4567 |
| F3 | NA |
| F3 | 4789 |
| F3 | 5567 |
| F3 | 6002 |
| F4 | 9045 |
| F4 | 9500 |
| F4 | 9760 |
| F4 | NA |
| F4 | 9150 |
Please help as I am beginner in the visualizations
You don't need to remove the NAs, they are ignored in the plot. You have at most 5 values per ID in your dataset so a density plot is not so useful. So for your example above, we can take the log10 and try a density:
ggplot(df,aes(x = Values,y=ID)) + geom_jitter(width=0.1) + scale_x_log10()
A stripchart might be more useful:
ggplot(df,aes(x = Values,y=ID)) + geom_jitter(width=0.1) + scale_x_log10()

Rearranging Table in R using tidyverse

I have data formatted in the following way:
-------------------------
| A01 | value | |
-------------------------
| A01 | value | |
-------------------------
| A01 | value | |
-------------------------
| A02 | value | |
-------------------------
| A02 | value | |
-------------------------
| A02 | value | |
-------------------------
| A03 | value | |
-------------------------
| A03 | value | |
-------------------------
| A03 | value | |
-------------------------
| A04 | value | |
-------------------------
| A04 | value | |
-------------------------
| A04 | value | |
I want to extract the values from rows labeled as A02 and paste them in separate column beside the rows labeled as A01. Similarly for, A03 and A04 and so on.
Basically I want to rearrange like this:
-------------------------
| A01 | value | A02 | value |
-------------------------
| A01 | value | A02 | value |
-------------------------
| A01 | value | A02 | value |
-------------------------
| A03 | value | A04 | value |
-------------------------
| A03 | value | A04 | value |
-------------------------
| A03 | value | A04 | value |
I am learning the tidyverse in R, but I am very new and I have not been able to find the right function to do this yet. I would appreciate any help. Thanks in advance.
This way you can look for all rows labeled with an even number or odd numbers, separate them and join them together afterwards. In this approach it is assumed, that labels only go to maximum A09 (if larger values are present, you have to modify substr). Also it will only work if you have the same amount of even and odd labelled values. But for your example data it works as requested!
library(tibble)
library(dplyr)
library(tidyr)
## example data ##
value = c(rep(1:4,3))
who = paste0("A0", c(rep(1:4,3)) )
tbl <- tibble::tibble(who = who, value = value)
## substr(who,3,3) extracts last letter from name, as.numeric() turns it into numeric
## %% 2 == 0 <- modulo division, return remainder, only even numbers have remainder 0
tbl <- tbl %>% mutate(is_even = as.numeric(substr(who,3,3)) %% 2 == 0)
## Filter all rows with even number in label
tbl_even <- tbl %>% filter(is_even == TRUE) %>% dplyr::select(-is_even)
## Filter all rows with odd number in label
tbl_odd <- tbl %>% filter(is_even == FALSE) %>% dplyr::select(-is_even)
## Join even and odd values together
result <- tbl_odd %>% cbind(tbl_even)

Populate table column by relative row values of another column in R

I’m trying to populate a table column by relative row values of another column in R. I have a table with two data columns (Col1, Col2) and two point value columns (P1, P2). Data1 is populated, Data2 is not. I want the value of Data2 to be populated by the value in either P1 or P2, based on the relative value of Data 1. In a given row, if the previous value of Data1 is higher than its current value, the Data2 cell is populated by the value in P1. If the previous value of Data1 is lower than its current value, the Data2 cell is populated by the value in P2. To illustrate what I’m trying to do, I’ve provided two sample tables. The first table is what I have (Data2 is not populated), and the second table is the desired outcome.
Table1 (What I have)
+-----+----+----+-------+-------+
| FID | P1 | P2 | Data1 | Data2 |
+-----+----+----+-------+-------+
| 1 | A | B | 50 | |
| 2 | C | D | 40 | |
| 3 | E | F | 60 | |
| 4 | G | H | 70 | |
| 5 | I | J | 65 | |
Table2 (Desired Outcome)
+-----+----+----+-------+-------+
| FID | P1 | P2 | Data1 | Data2 |
+-----+----+----+-------+-------+
| 1 | A | B | 50 | NA |
| 2 | C | D | 40 | C |
| 3 | E | F | 60 | F |
| 4 | G | H | 70 | H |
| 5 | I | J | 65 | I |
+-----+----+----+-------+-------+
Is there a built in function in R to accomplish this? If not, any advice on how to create one?
A solution using dplyr could be:
df %>%
mutate(Data2 = ifelse(lag(Data1) > Data1, paste0(P1), paste0(P2)))
FID P1 P2 Data1 Data2
1 1 A B 50 <NA>
2 2 C D 40 C
3 3 E F 60 F
4 4 G H 70 H
5 5 I J 65 I

SQLite syntax - join data from 3 tables

I have 3 sqlite tables:
table inspections, where insp_id is primary key
id | name | deleted
------------------------------
I1 | Inspection A | (null)
I2 | Inspection B | (null)
I3 | Inspection C | 1
table equip_insp, where equip_id, insp_id are primary keys
equip_id | insp_id | period | period_type
--------------------------------------------
E1 | I1 | 1 | Y
E1 | I2 | 6 | M
E2 | I1 | 1 | M
table equip_certif, where id is primary key
id | equip_id | insp_id | date | certif_no | result | info
-------------------------------------------------------------------
C4 | E1 | I1 | 2015-02-01 | A-300 | Good | (null)
C3 | E1 | I1 | 2015-02-01 | A-200 | Good | (null)
C2 | E1 | I1 | 2015-01-10 | A-100 | Good | (null)
C1 | E1 | I2 | 2015-01-06 | B-100 | Good | (null)
All ID's are in fact numeric values, I use some letters just to be easy to connect them in between.
So, I would like help me with the Sqlite syntax that for item E1, display all the inspection defined (ascending), then if exist, to display the periodicity and then to display the latest certificate date (if there are 2 certificates in the same date, get the latest id), number and result that is not info
Result should be something like this:
id | name | period | period_type | certif_no | date | result
--------------------------------------------------------------------------
I1 | Inspection A | 1 | Y | A-300 | 2015-02-01 | Good
I2 | Inspection B | 6 | M | B-100 | 2015-01-06 | Good
I've try this, but I'm not so sure that is correct.
SELECT inspections.id, inspections.name, equip_insp.period, equip_insp.period_type, equip_certif.certif_no, equip_certif.date AS certif_date, equip_certif.result
FROM inspections
LEFT JOIN equip_insp ON (inspections.id = equip_insp.insp_id AND equip_insp.equip_id = 'E1')
LEFT JOIN equip_certif ON (inspections.id = equip_certif.insp_id AND equip_certif.info IS NULL)
WHERE inspections.deleted IS NULL
GROUP BY equip_insp.insp_id
ORDER BY inspections.id, date(equip_certif.date) DESC, equip_certif.id DESC
To specifiy which row from a group gets returned, you must use MAX(); otherwise, you get some randrom row:
SELECT ..., MAX(equip_certif.date) AS certif_date, ...
FROM ...
GROUP BY equip_insp.insp_id
...
(This works only in SQLite 3.7.11 or later; in earlier version, the query would get more complex.)
After playing with SQLite I get the solution by myself. So the answer is:
SELECT inspections.id, inspections.name, equip_insp.period, equip_insp.period_type, equip_certif.certif_no, equip_certif.date AS certif_date, equip_certif.result
FROM inspections
LEFT JOIN equip_insp ON (inspections.id = equip_insp.insp_id AND equip_insp.equip_id = 'E1')
LEFT JOIN equip_certif ON (inspections.id = equip_certif.insp_id AND equip_certif.equip_id = equip_insp.equip_id AND equip_certif.info IS NULL)
WHERE inspections.deleted IS NULL
GROUP BY inspections.id
ORDER BY inspections.id, date(equip_certif.date) DESC, equip_certif.id DESC

Resources