LeetCode 2199. Finding the Topic of Each Post SQL Solution
Problem
LeetCode SQL Problem
- Finding the Topic of Each Post
Posts table
| post_id | content |
|---|---|
| 1 | We call it soccer They call it football hahaha |
| 2 | Americans prefer basketball while Europeans love handball and football |
| 3 | stop the war and play handball |
| 4 | warning I planted some flowers this morning and then got vaccinated |
Keywords table
| topic_id | word |
|---|---|
| 1 | handball |
| 1 | football |
| 3 | WAR |
| 2 | Vaccine |
Firstable, we will split each post record into multiple rows to find out each post's topics. We will then combine each post's distinct topics to a comma-delimited string.
Solution
MySQL
- Use CHAR_LENGTH() and REPLACE() functions to find out the maximum number of words we need to split.
- Create a
Numberstable dynamically in the memory using recursive CTE which contains numbers from 1 to the maximum number of words to we need to split - INNER JOIN
NumbersandPoststables and utilize nested SUBSTRING_INDEX() calls to split each post into multiple keyword rows. - LEFT JOIN
PostKeywordsandKeywordstables to output all posts including ones that do not belong to any topic. - Utilize GROUP_CONCAT() function to generated comma-delimited distinct topic string for each post.
- For posts that does not have any topic, GROUP_CONCAT() result will be NULL. In this case, we'll output 'Ambiguous!'.
TSQL
- Utilize Cross Apply and STRING_SPLIT() function to split each post into multiple keyword rows.
- LEFT JOIN
PostKeywordsandKeywordstables to output distinctpost_idandtopic_idpairs including ones that do not belong to any topic. - Utilize STRING_AGG() function to generated comma-delimited distinct topic string for each post.
- For posts that does not have any topic, STRING_AGG() result will be NULL. In this case, we'll output 'Ambiguous!'.
- MySQL
- TSQL
SET @maxWords := (
SELECT max(CHAR_LENGTH(content) - CHAR_LENGTH(REPLACE(content, ' ', '')))
FROM Posts
);
WITH recursive Numbers (n)
AS (
SELECT 1
UNION
SELECT n + 1
FROM Numbers
WHERE n <= @maxWords
),
PostKeywords
AS (
SELECT *,
SUBSTRING_INDEX(SUBSTRING_INDEX(p.content, ' ', Numbers.n), ' ', - 1) AS keyword
FROM Numbers
INNER JOIN Posts p ON CHAR_LENGTH(p.content) - CHAR_LENGTH(REPLACE(p.content, ' ', '')) + 1 >= Numbers.n
)
SELECT pk.post_id,
IFNULL(GROUP_CONCAT(DISTINCT k.topic_id ORDER BY topic_id SEPARATOR ','), 'Ambiguous!') AS topic
FROM PostKeywords pk
LEFT JOIN Keywords k ON pk.keyword = k.word
GROUP BY pk.post_id;
WITH PostKeywords
AS (
SELECT P.*
,value AS keyword
FROM Posts AS P
CROSS APPLY string_split(P.content, ' ')
)
,DistinctPostTopicPairs
AS (
SELECT DISTINCT PK.post_id
,K.topic_id
FROM PostKeywords AS PK
LEFT JOIN Keywords AS K ON PK.keyword = K.word
)
SELECT DPT.post_id
,ISNULL(STRING_AGG(DPT.topic_id, ',') WITHIN GROUP (
ORDER BY DPT.topic_id ASC
), 'Ambiguous!') AS topic
FROM DistinctPostTopicPairs AS DPT
GROUP BY DPT.post_id;
Query Output
| post_id | topic |
|---|---|
| 1 | 1 |
| 2 | 1 |
| 3 | 1,3 |
| 4 | Ambiguous! |