How to use Data Science Superpowers for Useless Things: Getting a Job at Amazon
IMPETUS: I’d like a job as a data scientist at Amazon.
PROBLEM: There are 3062 jobs listing “data scientist” somewhere in the title or description (at least this is the number of postings that comes back when I search “data science”)
SOLUTION: Open every single one and compare the requirements and job description to my resume and desires
PROBLEM: Assuming this takes me one minute per posting (which is generous – during trials it took me two minutes on average), this would take me 3062 minutes, or 51 hours or just over two days of uninterrupted job-post-reading.
[Ain’t nobody got time for that gif]
SOLUTION: Use our friend Beautiful Soup to scrape the webpage!
PROBLEM: They are using React and Beautiful Soup is trying to scrape before the whole document has been rendered! Que lastima!
SOLUTION: Do a complicated set of awaits. JUST KIDDING. Go straight to the source and find the XHR request that is generating all this data in the first place.
[SHOW CODE HERE]
PROBLEM: We now have a spreadsheet of 3062 job postings. How can we start to narrow this down?!
SOLUTION: Remove jobs outside of the US.
PROBLEM: We still have 2543 jobs to “look at.”
SOLUTION: Let’s help ourselves out by narrowing the search down by searching through ONLY the jobs that have the words “data scientist” in the title.
–HOORAY!! This brought us down to 556!! HOWEVER…
PROBLEM: We still have 556 jobs to “look at.”
SOLUTION: Look to see how many might be duplicates – or SLIGHT duplicates, meaning same requirements different departments – by seeing how many unique “basic qualification” lists we have.
–HOORAY!! This brought us down to 161!! An almost manageable number, right?
PROBLEM: That’s still a little over 2.5 hours assuming we are “reading” each at our “optimal” speed. For those of you counting, that is roughly the length of Man of Steel or Wonder Woman, both of which I need to watch this week in preparation for the Snyder Cut.
SOLUTION: NOW we can start to use NLP
WAIT WAIT!: A lot of these appear to be “senior” positions and while we might be “senior” in some things (owning cats, for example) we are not quite “senior” in data science, yet.
SO…
PROBLEM: A lot of positions are “senior”
SOLUTION: Remove any positions that include “senior” or “sr.” !
–HOORAY!! This brought us down from 642 to 286!!
Now, lets see how many of those 286 are unique!!
ONLY 72!!
NOTES:
SOLUTION: Investigate other ways of narrowing the search down before we start to apply NLP?
BRAINSTORM OTHER WAYS TO NARROW DOWN THE SEARCH: We could narrow down by city – but a lot is going remote these days. We could narrow down by team – but this isn’t an easily accessible category (yes, it’s there in an object under “team” but many still don’t actually have a team listed). But really, we should be narrowing down by the qualifications. I shouldn’t be spending any time looking at jobs requiring 5+ years of experience. I shouldn’t be looking at jobs requiring Hadoop. I shouldn’t be looking at jobs that say “data warehouse management.”
But this is taking a long time to list out the things I shouldn’t look at… right? What if there was a way that I could “read” through the job requirements and assign point values to each req based on the list of skills I have? What if there was a way that I could “read” through the job description and assign point values to each description based on what I’m looking for?
WELL DEAR FRIENDS. THANKFULLY, THERE IS.
BUT WAIT. First, let’s actuallllllly look at the titles of things