Using Cte To Identify Best Industries By Number Of Unicorns
Published:
In this project, I will be working with data from investment firm to analyze trends in high-growth companies. The goal is to identify industries which are producing the highest valuations and the rate at which new high-value companies are emerging. This data will provide insightful information to give investment firms a competitive insight about the ongoing industry trends and how they should structure their portfolio looking forward.
Skills : SQL, data cleaning, data exploration, common table expression
The dataset tables are as follows:
dates
| Column | Description |
|------------- |--------------------------------------------- |
| `company_id` | A unique ID for the company. |
| `date_joined` | The date that the company became a unicorn. |
| `year_founded` | The year that the company was founded. |
funding
| Column | Description |
|----------------- |--------------------------------------------- |
| `company_id` | A unique ID for the company. |
| `valuation` | Company value in US dollars. |
| `funding` | The amount of funding raised in US dollars. |
| `select_investors` | A list of key investors in the company. |
industries
| Column | Description |
|------------- |--------------------------------------------- |
| `company_id` | A unique ID for the company. |
| `industry` | The industry that the company operates in. |
companies
| Column | Description |
|------------- |-------------------------------------------------- |
| `company_id` | A unique ID for the company. |
| `company` | The name of the company. |
| `city` | The city where the company is headquartered. |
| `country` | The country where the company is headquartered. |
| `continent` | The continent where the company is headquartered. |
Here I will use CTE ( common table expression ) to first find top industries, then get the yearly ranking and then based on data I will write a query which will give top 3 industries for the years 2019, 2020, 2021 ordered by the calendar year, and number of unicorns
--- finding top industries
WITH top_industry AS (
SELECT i.industry, COUNT(i.*) AS company_count
FROM industries AS i
INNER JOIN dates AS d
ON i.company_id = d.company_id
WHERE EXTRACT(year FROM d.date_joined) IN ('2019', '2020', '2021')
GROUP BY industry
ORDER BY company_count DESC
limit 3
),
--Gathering yearly ranking
yearly_ranking AS (
SELECT COUNT(i.*) AS num_unicorns,
i.industry,EXTRACT(year FROM d.date_joined) AS year,
AVG(f.valuation) AS avg_valuation
FROM industries AS i
INNER JOIN dates AS d
ON i.company_id = d.company_id
INNER JOIN funding AS f
ON i.company_id = f.company_id
GROUP BY industry, year
)
SELECT industry, year, num_unicorns,
ROUND(AVG(avg_valuation)/1000000000,2) AS average_valuation_billions
FROM yearly_ranking
WHERE year IN (2019,2020, 2021) AND industry IN (SELECT industry FROM top_industry)
GROUP BY industry, year, num_unicorns
ORDER BY year, num_unicorns;
industry | year | num_unicorns | average_valuation_billions | |
---|---|---|---|---|
0 | E-commerce & direct-to-consumer | 2019 | 12 | 2.58 |
1 | Internet software & services | 2019 | 13 | 4.23 |
2 | Fintech | 2019 | 20 | 6.80 |
3 | Fintech | 2020 | 15 | 4.33 |
4 | E-commerce & direct-to-consumer | 2020 | 16 | 4.00 |
5 | Internet software & services | 2020 | 20 | 4.35 |
6 | E-commerce & direct-to-consumer | 2021 | 47 | 2.47 |
7 | Internet software & services | 2021 | 119 | 2.15 |
8 | Fintech | 2021 | 138 | 2.75 |