I had a habit of posting solutions to LeetCode problems that I solve on my GitHub repository.
But I had to manually copy the question data from the LeetCode website & paste it into a file of certain format which I follow.
The data which I have to manually copy included:
Question ID
Question Title
Question Difficulty
Problem Statement
Example Test Cases
Constraints
So, I thought of automating this process by writing a script which will scrap the question data from the LeetCode website.
Note : This solves only 1st part of the automation problem that I have. The 2nd part is to automate the process of creating the solution file & copying the question data into it.
My initial approach was to use selenium to scrap the data from the link/url of the LeetCode problem. But I hit a roadblock when I couldn't find a way to extract the data from the page source.
The only way I could think of was to extract HTML elements but the page source was dynamic & it was a burden to target the divs & classes.
Then I came across a StackOverflow answer which suggested to use simple POST requests to get the dynamic content of the page using the URL Slug.
I'd like to thank that stranger.
So, the approach is to send a POST request, for which we can use the requests library in Python. This request will return a JSON response containing the requested data.
I don't freaking want the entirety of the json response. What should I do with companyTagStats, judgeType, mysqlSchemas, etc.?
I had a simple use-case, which was to get the questionId, title, difficulty & content which I would pass onto some other file.
So, I simply extracted the required data from the json response one by one:
What the heck is u'\xa0'? It is a non-breaking space character in Unicode, which was being used in the HTML content.
We, don't want any unnecessary characters in our content (as it may break the output anytime), so I replaced it with a normal space.
Stick around for the next part where I'll automate the process of creating the solution file & copying the question data into it. Also, I maintain a README file which acts as a log of all the solutions. I'll automate the process of updating the README file as well.