How to Merge the Headers of multiple columns in HTSQL - htsql

I am making a Table in HTSQL, which results like:
Name Total_Calls Successful_Calls Failed_Calls
John 9 5 4
Where as I want to merge the headers like:
C A L L S
Name Total Successful Failed
John 9 5 4

You could use :as to rename a column (total_calls to total)
and then use brackets to mark your nested record. Here is
an example:
http://demo.htsql.org/student%7Bid,%20start_date,%20%7Bname%20:as%20full_name,%20gender%20:as%20sex,%20dob%20:as%20birth_date%7D%20:as%20personal_information%20%7D

Related

renaming a specific column of each dataframe in a list as part of a loop

Basically, I would like to create multiple dataframes and attach them to a list (all within a loop) after renaming second columns. Below is the sample code.
the problem is - I want to rename the second column of each dataframe to the loop variable but couldn't manage yet.
### create a blank list
temp_list<-vector("list",)
### create a vector with values to use in the loop
temp_years<-c("2010","2011")
### loop to generate the dataframes with name i
##add to the list above, then rename column 2
### of each dataframe to the loop variable (i).
for (i in temp_years){
temp_df<-data.frame(coltest11=runif(4),coltest12=runif(4))
temp_list[[i]]<-temp_df
names(temp_list[[i]][2])<-i
}
desired output in terms of column titles:
$`2010`
coltest11 **2010**
1 0.4781636 0.28835747
2 0.1413173 0.84415993
3 0.6564438 0.01405185
4 0.3046113 0.83951115
$`2011`
coltest11 **2011**
1 0.8050338 0.2284567
2 0.3049061 0.8308597
3 0.2920562 0.8118845
4 0.3452323 0.9222456
You can try this:
### create a blank list
temp_list<-list()
### create a vector with values to use in the loop
temp_years<-c("2010","2011")
#Loop
for (i in temp_years){
temp_df<-data.frame(coltest11=runif(4),coltest12=runif(4))
temp_list[[i]]<-temp_df
names(temp_list[[i]])[2]<-i
}
Output:
$`2010`
coltest11 2010
1 0.1481673 0.5234788
2 0.4055919 0.5426163
3 0.2353523 0.5847577
4 0.5258541 0.6792990
$`2011`
coltest11 2011
1 0.4292431 0.7717647
2 0.2160180 0.4033482
3 0.8142830 0.6944202
4 0.5900886 0.4449840
Add colnames(temp_df)[2] <- i to your loop
temp_list<-vector("list",)
temp_years<-c("2010","2011")
for (i in temp_years){
temp_df<-data.frame(coltest11=runif(4),coltest12=runif(4))
colnames(temp_df)[2] <- i
temp_list[[i]]<-temp_df
names(temp_list[[i]][2])<-i
}

Search and term variables by character string

I have a maybe simple problem, but I can't solve it.
I have two list's. List A is empty and a list B has several named columns. Now, I want to select a colum of B by a variable and put it in list A. Somehow like shown in the example:
A<-list()
B<-list()
VAR<-"a"
B$a<-c(1:10)
B$b<-c(10:20)
B$c<-c(20:30)
#This of course dosn't work...
A$VAR<-B$VAR
You can extract list entry with B[[VAR]] and append new entry to a list using get (A[[get("VAR")]] <- newEntry):
A[[get("VAR")]] <- B[[VAR]]
## A list
# $a
# [1] 1 2 3 4 5 6 7 8 9 10

R - How to delete rows before a certain phrase and after a certain phrase in a single column

Hi I would like to delete rows before a certain phrase and then after the same (almost) phrase which appears later on. I guess another way to look at it would be keep only the data from the start and end of a certain section.
My data is as follows:
df <- data.frame(time = as.factor(c(1,2,3,4,5,6,7,8,9,10,11,12,13)),
type = c("","","GMT:yyyy-mm-dd_HH:MM:SS_LT:2016-10-18_06:09:53","(K)","","","","(K)","(K)","","(K)","GMT:yyyy-mm-dd_HH:MM:SS_CAM:2016-10-18_06:20:03",""),
names = c("J","J","J","J","J","J","J","J","J","J","J","J","J"))
and I would like to delete everything before the first GMT:yyyy... phrase and after the second GMT:yyyy... phrase. So the end product would be
time type names
3 GMT:yyyy-mm-dd_HH:MM:SS_LT:2016-10-18_06:09:53 J
4 (K) J
5 J
6 J
7 J
8 (K) J
9 (K) J
10 J
11 (K) J
12 GMT:yyyy-mm-dd_HH:MM:SS_LT:2016-10-18_06:20:03 J
I thought subset might work but it is giving me problems.
Using grep, you can locate the indexes of the rows where your pattern is found:
ind=grep("^GMT",df$type)
Then you can keep only the rows between the two indexes:
df=df[ind[1]:ind[2],]
library(tidyverse)
library(stringr)
df2 <- df %>% slice(str_which(type, "GMT")[1]:str_which(type, "GMT")[2])

Filtering Rows matching String condition in R

I`ve got some problems filtering for duplicate elements in a string. My data look similar to this:
idvisit path
1 1,16,23,59,16
2 2,14,19,14
3 5,19,23
4 10,21
5 23,27,29,23
I have a column containing an unique ID and a column containing a path for web page navigation. The right column contains some cases, where pages were accessed twice or more often, but some different pages are between these accesses. I just want to filter() the rows, where pages occur twice or more often and at least one page is in bettween the two accesses, so the data should look like this.
idvisit path
1 1,16,23,59,16
2 2,14,19,14
5 23,27,29,23
I just want to remove the rows that match the conditions. I really dont know how to handle a String with using a variable for the many different numbers.
You can filter based on the number of elements in each string. Strings with duplicated entries will be larger than their unique lengths, i.e.
df1[sapply(strsplit(as.character(df1$path), ','), function(i) length(unique(i)) != length(i)),]
# idvisit path
#1 1 1,16,23,59,16
#2 2 2,14,19,14
#5 5 23,27,29,23
We can try
library(data.table)
lst <- strsplit(df1$path, ",")
df1[lengths(lst) != sapply(lst, uniqueN),]
# idvisit path
#1 1 1,16,23,59,16
#2 2 2,14,19,14
#5 5 23,27,29,23
Or an option using tidyverse
library(tidyverse)
separate_rows(df1, path) %>%
group_by(idvisit) %>%
filter(n_distinct(path) != n()) %>%
summarise(path = toString(path))
You could try regular expressions too with grepl:
df[grepl('.*([0-9]+),.*,\\1', as.character(df$path)),]
# idvisit path
#1 1 1,16,23,59,16
#2 2 2,14,19,14
#5 5 23,27,29,23

Create list of one sub-column based on another column

I have a data set that looks like:
Files Batch
filepath1.txt One
filepath2.txt One
filepath3.txt One
filepath4.txt One
filepath5.txt two
filepath6.txt two
filepath7.txt two
filepath8.txt two
I want to loop over the full data set (that has a dozen "Batch" categories) by creating groups of "Files" that is based on what "Batch" they're in, in a new variable called "batch"
i.e.
batch[1]
filepath1.txt
filepath2.txt
filepath3.txt
filepath4.txt
batch[2]
filepath5.txt
filepath6.txt
filepath7.txt
filepath8.txt
How do I do this for all my Batch groups in the full data-set?
The split function seems to be what you're looking for.
> dat <- data.frame(File = paste0("file", 1:10, ".txt"), Batch = rep(c("one", "two"), each = 5))
> dat
File Batch
1 file1.txt one
2 file2.txt one
3 file3.txt one
4 file4.txt one
5 file5.txt one
6 file6.txt two
7 file7.txt two
8 file8.txt two
9 file9.txt two
10 file10.txt two
> split(dat, dat$Batch)
$one
File Batch
1 file1.txt one
2 file2.txt one
3 file3.txt one
4 file4.txt one
5 file5.txt one
$two
File Batch
6 file6.txt two
7 file7.txt two
8 file8.txt two
9 file9.txt two
10 file10.txt two

Resources