How to append rows of data frame to the first one in R? - r

I have a data frame that I am trying to condense from multiples rows into one row.The data set is fairly large, but I am starting with a small subset. So here I want to turn 2 rows into 1; I want the information to follow the information in the first row.
The original problem was that I had a column of data that I need to "flatten" so that I can use the bits and pieces. The column is in JSON format.
"[{\"task\":\"T0\",\"task_label\":\"Did any birds visit the feeding platform or bird feeders?\",\"value\":\"**Yes**—but there were no displacements. Next, enter all of the birds you see at the feeders. \"},{\"task\":\"T1\",\"value\":[{\"choice\":\"EUROPEANSTARLING\",\"answers\":{\"WHATISTHELARGESTNUMBEROFINDIVIDUALSTHATYOUSAWSIMULTANEOUSLY\":\"4\"},\"filters\":{}},{\"choice\":\"MOURNINGDOVE\",\"answers\":{\"WHATISTHELARGESTNUMBEROFINDIVIDUALSTHATYOUSAWSIMULTANEOUSLY\":\"2\"},\"filters\":{}}]},{\"task\":\"T6\",\"task_label\":\"Is it actively precipitating (rain or snow)?\",\"value\":[\"Yes.\"]}]"
So I used code developed by another coder to "flatten" this out by task. Then, I want to join it back up so that I have one line of information for each classification.
Currently, I have merged tasks T0 and T4, but I need to merge this to another task, T5. In order to do that, I need to reduce the data in merge of T0 and T4 to one row. So right now I'm working with a small subset of the data and have a table that essentially looks like this:
x <- data.frame("subject_ids" = c(19232716, 19232716), "classification_id" = c(120545061,120545061), "task_index.x" = c(1,1),
"task.x" = c("TO","TO"), "value" = c("Displacement","Displacement"), "task_index.y"=c(2,5), "task.y"= c("T4, T4","T4"),
"total.species"=c("2,2","1"), "choice" = c("MOURNINGDOVE, COMMONGRACKLE","MOURNINGDOVE"), "S_T"=c("Target,Target","Target,Source"))
but I want it to look like this:
y <- data.frame("subject_ids" = c(19232716), "classification_id" = c(120545061), "task_index.x" = c(1),
"task.x" = c("TO"), "value" = c("Displacement"), "task_index.y"=c(2), "task.y"= "T4, T4",
"total.species"=c("2,2"), "choice" = c("MOURNINGDOVE, COMMONGRACKLE"), "S_T"=c("Target,Target"),
"task_index.y"=c(5), "task.y"= "T4",
"total.species"=c("1"), "choice" = c("MOURNINGDOVE"), "S_T"=c("Target,Source"))

Related

hvplot variable coordinate quadmesh

I have an Xarray Dataset which looks like
I would like to be able to select data variables (which i already know how to do) and plot a quadmesh of the data variable selected. The problem is that different data variables have different numbers and types of coordinates which means the closest I got is having a really long hard coded switch statement to handle each possible data variable and have a sel on it with the appropriate number and type of coordinates (for ES I would need to select on pres1 and time). How would I approach this in order to be able to allow the user to visualize the geospatial data no matter the data variable selected?
Here is my current attempt :
select_field = pnw.Select(name="Field", options=list(xds.data_vars))
select_time = pnw.Select(name="Time", options=list(xds.coords['time'].values))
select_pres1 = pnw.Select(name="Pressure 1", options=list(xds.coords['pres1'].values))
def fieldFiltered(select_field):
return xds[select_field]
xdsi = hvplot.bind(fieldFiltered, select_field).interactive(sizing_mode='stretch_both')
wid = xdsi.sel( pres1=select_pres1 , time=select_time ).widgets()
ploti = xdsi.sel(time=select_time).hvplot()
pn.Row(
wid[1],
wid[2],
ploti
)
which gives me something similar to
but breaks for any other variable

Merging two data.tables results in data.table of only one repetitive merge factor

I am merging a single column dt with a helper table dt, to eventually be added to a larger dt.
My data.table named "full" is a wide and long data.table, of which I am selecting column "category_product" which is a series of categories of snacks. I copied full$category_product to its own dt.
Helper1 is a dt that has each category of snack listed once, along with a column of the name of the snack and a number associated. I need to do this VLOOKUP style so that the name of each snack and number is listed down the categories.
The line I'm using is this and its variations is:
helper_categories = merge(assist1, helper1)
This is my process:
assist1 = data.table(full$category_product)
setnames(assist1, old = c('V1'),new = c('category_product'))
assist1[] <- lapply(assist1, as.character)
helper_categories = merge(assist1, helper1)
My assist1 table (is over 1 million rows long but) looks like:
assist1 = data.table(category_product = 'Salty','Water','Juice, tea and smoothies')
And my complete helper1 table looks like:
helper1 = data.table(
category_product = c('Sugar candy','Cookies, pastries and cereals','Chocolate based','Salty','Juice, tea and smoothies','Carbonates and energy drinks','Water','Milk based'),
ph1 = c('sugar_confectionary_incl_gums_1','bakery_and_pastries_1','chocolate_based_9','salty_excl_crisps_4','tea_and_coffee_based_4','unflavoured_carbonates_3','water_(flavoured)_2','milk_and_milk-based_4'),
mh1 = c(2.054545,2.618182,2.172727,2.963636,3.136364,2.390909,3.154545,2.936364))
However, the results I'm getting when I open helper_categories is just a data.table with 'Carbonates and energy drinks' and its info repeated over and over and over again. It looks like this:
helper_categories = data.table(category_product = rep('Carbonates and energy drinks', 20), ph1 = rep('unflavoured_carbonates_3', 20), mh1 = rep(2.390909, 20)
Any idea what's going on or how to fix it? I adjusted the class of the common column to be character as I thought that could be it, but to no avail.

How can I get data tables to be refreshed in Spotfire if I am using a data function to import data from Qualtrics?

I am using the package qualtRics in TERR in Spotfire to pull in data directly from specific surveys in Qualtrics. The code I am using is:
registerApiKey(API.TOKEN = "xxxx")
df <- getSurvey(surveyID = "xxxx",
root_url = "https://az1.qualtrics.com", verbose = TRUE)
My output df is a data table. I have 2 different surveys that I am pulling in 4 different times, 2 of those times I am unpivoting data, for a total of 4 data tables.
I want to be able to refresh this data. If I click Reload Data or try to refresh each table individually, nothing happens. I'm assuming I need to add some code that refreshes the data function (?), and I am trying to avoid replacing the data tables each time because, for 2 of those, I have to manually select which columns I am unpivoting (and I have 75+ columns).
Is there a way I can accomplish what I'm looking for? I am a beginner Spotfire/R user, so I am learning as I go!
I am not able to reply to your question as i dont have enough permission so keeping it as separate answer.
Replacing table each time is good idea,
By This you can fix your no of columns for pivoting/UnPivoting.
------R Code
row <- data.frame(Data_Points = nrows,
Col1 = col1, Col2 = col2, YStart = y1, YEnd = y2)
row <- cbind(df, row)
return(row)
And also you can list your fix columns into DocumentProperty and loop it into your DataFunction.
Instead of using spotfire's pivot/unpivot, you can try doing the unpivot within the R code of the data function.

Merge is duplicating rows in r

I have two data sets with country names in common.
first data frame
As you can see, both data sets have a two letter country code formated the same way.
After running this code:
merged<- merge(aggdata, Trade, by="Group.1" , all.y = TRUE, all.x=TRUE)
I get the following result
Rather than having 2 rows with the same country code, I'd like them to be combine.
Thanks!
I strongly suspect that the Group.1 strings in one or other of your data frames has one or more trailing spaces, so they appear identical when viewed, but are not. An easy way of visually checking whether they are the same:
levels(as.factor(Trade$Group.1))
levels(as.factor(aggdata$Group.1))
If the problem does turn out to be trailing spaces, then if you are using R 3.2.0 or higher, try:
Trade$Group.1 <- trimws(Trade$Group.1)
aggdata$Group.1 <- trimws(aggdata$Group.1)
Even better, if you are using read.table etc. to input your data, then use the parameter strip.white=TRUE
For future reference, it would be better to post at least a sample of your data rather than a screenshot.
The following works for me:
aggdata <- data.frame(Group.1 = c('AT', 'BE'), CASEID = c(1587.6551, 506.5), ISOCNTRY = c(NA, NA),
QC17_2 = c(2.0, 1.972332), D70 = c(1.787440, 1.800395))
Trade <- data.frame(Group.1 = c('AT', 'BE'), trade = c(99.77201, 100.10685))
merged<- merge(aggdata, Trade, by="Group.1" , all.y = TRUE, all.x=TRUE)
I had to transcribe your data by hand from your screenshots, so I only did the first two rows. If you could paste in a full sample of your data, that would be helpful. See here for some guidelines on producing a reproducible example: https://stackoverflow.com/a/5963610/236541

How to group repeating sequences of numbers using R

The simplest description of what I am trying to do is that I have a column in a data.frame like 1,2,3,..., n, 1,2,3,...n,.... and I want group the first 1...n as 1 the second 1...n as 2 and so on.
The full context is; I am using the R spcosa package to do equal area stratification composite sampling on parcels of land. I start with a shape file from a GIS that contains a number of polygons (land parcels). The end result I want is a GIS file with each of the strata and sample locations in a GIS file format with each stratum and sample location labeled by land parcel, stratum and sample id. So far I can do all this except one bit which is identifying the stratum that the samples belongs too and including it in the sample label. The sample label needs to look like "parcel#-strata#-composite# (where # is the number). In practice I don't need this actual label but as separate attributes in GIS file.
The basic work flow is a follows
For each individual polygon using spcosa::stratify I divide it into a number of equal area strata like
strata.CSEA <- stratify(poly[i,], nStrata = n, nTry = 1, equalArea = TRUE, nGridCells = x)
Note spcosa::stratify generates a CompactStratificationEqualArea object. I cocerce this to a SpatialPixelData then use rasterToPolygon to be able to output it as a GIS file.
I then generate the sample locations as follows:
samples.SPRC <- spsample(strata.CSEA, n = n, type = "composite")
spcosa::spsample creates a SamplingPatternRandomComposite object. I coerce this to a SpatialPointsDataFrame
samples.SPDF <- as(samples.SPRC, "SpatialPointsDataFrame")
and add two columns to the #data slot
samples.SPDF#data$Strata <- "this is the bit I can't do yet"
samples.SPDF#data$CEA <- poly[i,]$name
I can then write samples.SPDF as a GIS file ( ie writeOGE) with all the wanted attributes.
As above the part I can't sort out is how the sample ids relate to the strata ids. The sample points are a vector like 1,2,3...n, 1,2,3...n,.... How do I extract which sample goes with which strata? As actual strata number are arbitrary, I can just group ( as per my simple question above) but ideally I would like to use the numbering of the actual strata so everything lines up.
To give any contributors access to a hands on example I copy below the code from the spcosa documentation slightly modified to generate the correct objects.
# Note: the example below requires the 'rgdal'-package You may consider the 'maptools'-package as an alternative
if (require(rgdal)) {
# read a vector representation of the `Farmsum' field
shpFarmsum <- readOGR(
dsn = system.file("maps", package = "spcosa"),
layer = "farmsum"
)
# stratify `Farmsum' into 50 strata
# NB: increase argument 'nTry' to get better results
set.seed(314)
myStratification <- stratify(shpFarmsum, nStrata = 50, nTry = 1, equalArea = TRUE)
# sample two sampling units per stratum
mySamplingPattern <- spsample(myStratification, n = 2 type = "composite")
# plot the resulting sampling pattern on
# top of the stratification
plot(myStratification, mySamplingPattern)
}
Maybe order() function can help you
n <- 10
dat <- data.frame(col1 = rep(1:n, 2), col2 = rnorm(2*n))
head(dat)
dat[order(dat$col1), ]
I did not get where the "ID" (1,2,3...n) is to be found; so let's assume you have your SpatialPolygonsDataFrame called shpFarmsum with a attribute data column "ID". You can access this column via shpFarmsum$ID. Therefore, if you want to create individual subsets for each ID this is one way to go:
for (i in unique(shpFarmsum$ID)) {
tempSubset shpFarmsum[shpFarmsum$ID == i,]
writeOGR(tempSubset, ".", paste0("subset_", i), driver = "ESRI Shapefile")
}
I added the line writeOGR(... so all subsets are written to your working direktory. However, you can change this line or add further analysis into the for-loop.
How it works
unique(shpFarmsum$ID) extracts all occuring IDs (compareable to your 1,2,3...n).
In each repetition of the for loop, another value of this IDs will be used to create a subset of the whole SpatialPolygonsDataFrame, which you can use for further analysis.

Resources