R returning new columns instead of new rows for additional records - r

I'm trying to find an easy way to take the output and outputs like the one below and convert them to have another row for every additional product column but with the same link in the first column, the ideal output is a data frame that has three columns, first one being the link, one for the product and one for the price.
I'm scraping this data from a website for practice, but having an issue with my output, right now its returning multiple columns per link where there are multiple products for each link - instead I want unique rows for every product.

Related

A way to search multiple lists and return a query for which list it belongs to

I have a csv file with multiple lists. See picture. What I want to do is query every single value so it tells me which list that the value is found in.
Eg I query number 898774 and it tells me 898774 - prim6 in set 1, set 2 and set 4.
I did find a quick work around by making one big list in excel, removing dupes and then manually searching all for each number. Doable for a small amount but not that good for '000s of sets.
I created a vector for each column and started a search with which(sapply) but then remembered I needed the names. Just a little bit out of my knowledge.

Teradata excluding columns from select

I'm curious as to how you handle instances where users want to know how to exclude columns that have sensitive data. I know explicitly listing the columns is an option but what do you do when you have tables/views with 50-100+ columns?
Example: Say I have a customer table that has 50 columns, I want to exclude 15 columns from the select as they hold sensitive data. I'm aware of all sensitive columns as I have a separate view that specifies them, is there any options to using this view to dynamically define the columns to select?
Appreciate any suggestions!
One possible solution I looked into was creating violatile tables and dropping the sensitive columns.

How can I get correlation between features and target variables using user defind function in python?

I have two different dataset. Where one dataset has all the features column like consumer price, gdp.
Another dataset has the information of different customers orders. I want to find out the correlation for each customers order with the features. At the end I want to store the information in a dataframe like, one column contains the customer Name , 2nd column must be the feature name and the 3rd one should be the correlation value.
It would be greatful if anyone helps me in this.

Tableau---Getting count from 2 different data sources and combining into one total

I am a tableau newbie and am trying to see if this is possible or not. I have 2 separate data sources where the same employees are listed, one is for closed cases and the other is for open cases. These data sources have some of the same columns, but for the most part they are different.
Is it possible to aggregate the case count for each employee on the closed and open data sources into a single column? For instance, if an employee has 50 closed cases and 23 open cases, I want it to show 73 for them.
I attempted to play around with the joins/unions but these didn't work properly and duplicated the data most times.
I think this is a great chance to leverage blends.
I have created a workbook with the Sample Superstore Excel dataset. This dataset has three sheets. I'll use the Orders and Returns sheets to demonstrate how we can calculate the net orders using blends.
The dataset I'm using can be found here.
Start by connecting to both the Orders and Returns separately. Once done with this step you should see the two data sources at the top of your data pane.
In this example, I'll calculate the Net Returns by Category. In your case, you're after the Total Cases by Employee, so just imagine Employee in place of Category.
Next, drag Category from the Orders data source onto the view, then select the Orders data source and click the chain icon to blend on Order ID.
You will need a common column between the two tables in order to blend.
Once blended I'll go back to the primary data source (indicated by the blue check mark) and create the Net Orders calculation.
This calculation uses the dot notation - similar to what you might see in SQL - to reference our other table.
To double check that our calculation is working properly, we can drag the components of this calculation onto the view and do the math.
Of course, once you are satisfied you can remove all but your blended calculation.
Blending isn't ideal in most cases but you could try it. Bring in each data source separately and "join" them within your workbook pane on Employee or hopefully an Employee_id. Click the little chain once you have them both loaded and you are on a worksheet tab. Then you could sum the counts by employee. Blending sometimes presents some issues with calculated fields across the two data sources but this is what I would try first.

SQLite Table Structure for Creating a 1:Many Index Relationship

Problem
Can't create a table with an index column that references multiple rows in a table. Picture example below of what I'm trying to create.
Overview
Imagine an (SQLite) table will hold stock dividend payments. The index column is set to the ticker symbols. However, each ticker symbol refers to multiple records, which are organized by a time stamp. The documentation on SQLite and about 15 other tutorials all seem to focus on indexing where there is always a 1:1 relationship between an index and a record. I would like to create an index with a 1:many relationship.
The lookup would find the appropriate stock by symbol, and then (probably) a secondary index on the dates in the first column. But I cannot find any examples where others have tried to set up this structure. Makes me think maybe I don't have the right approach, or this is just a special case.
I don't think your problem is actually a problem. Putting an index on a column doesn't mean it has to contain unique values. It's perfectly reasonable for values in an indexed column to repeat. Of course there are diminishing returns. E.g. If you have a million rows and only five different values in a column, an index on that column isn't really going to do much for you.
A good rule of thumb is to start with an index on the column(s) you're using in your where clause. Then run the queries and see if you're getting satisfactory performance.

Resources