daily log 03-03-21

less than 1 minute read

PROBLEM:

AWS has a zillion jobs with the title “Data Scientist” – YET all the descriptions and “Must Have Qualifications” are different depending on which department is needing the data scientist…

SOLUTION?:

Attempt to scrape using beautifulsoup
Realize it’s a react site (so it’s not rendering everything by the time soup is scraping)
Attempt to timeout the scraping (i.e. please wait until all components are loaded before scraping) – this did not work
Sigh, realize it’s silly to try to scrape when we can just USE THE NETWORK REQUESTS DUH
Find the json populating this page…
Use that…
Do this…

import requests
import pandas as pd

all_jobs = []
def get_data(i):
    url = "https://www.amazon.jobs/en/search.json?radius=24km&facets[]=location&facets[]=business_category&facets[]=category&facets[]=schedule_type_id&facets[]=employee_class&facets[]=normalized_location&facets[]=job_function_id&offset={}&result_limit=10&sort=relevant&latitude=&longitude=&loc_group_id=&loc_query=&base_query=data%20scientist&city=&country=&region=&county=&query_options=&".format(i)
    with urllib.request.urlopen(url) as url:
        data = json.loads(url.read().decode())
        return data

def add_data_to_df(data):
    for job in data['jobs']:
        all_jobs.append(job)

def do_the_thing():
    i = 10
#     while i < 21: for testing lololol
    while i < 3071:
        data = get_data(i)
        add_data_to_df(data)
        i += 10
df = pd.DataFrame(all_jobs)
df.to_csv('aws_jobs.csv')

Oh yeah, what team are they??

def get_label(team):
    try:
        return team['label']
    except:
        return 'no label'
df['team-label'] = df.apply(lambda x: get_label(x['team']), axis=1)

Share on

Twitter Facebook LinkedIn

Daniel Caraway

daily log 03-03-21

PROBLEM:

SOLUTION?:

Oh yeah, what team are they??

Share on

You may also enjoy

daily log 03-25-21

How to use Data Science Superpowers for Useless Things: Getting a Job at Amazon, Take 2

How to use Data Science Superpowers for Useless Things: Getting a Job at Amazon

How to use Data Science Superpowers for Useless Things: Adding Text to Images (aka Cats Narrate the Big Lebowski)