Link Search Menu Expand Document

A4—Altair basic charts

Due 2023-10-06, 11:59pm EST 16pts

Follow all the instructions below.

Please post any questions about this assignment on Slack.

Warning: This is an individual assignment.

Table of contents

Change log

  • N/A.

Aim of the assignment

Altair is a declarative statistical visualization library for Python, based on Vega and Vega-Lite. In this assignment, you are going to learn to create two basic visualizations using Altair. Then, you will customize those visualizations using Inkscape or Illustrator to be more useful for an end reader.

Instructions

If you run into problems see the tips, tricks, and troubleshooting section below.

Please look through all the materials below so you understand the setup instructions; how to run, organize, and submit your code; and our requirements for the visualization.

Setup instructions

  1. Accept the GitHub Classroom assignment invitation by clicking this link to get your repository:

    https://classroom.github.com/a/KtPF2nbY

    For reference, this is the template repository your repository is being created from: https://github.com/NEU-DS-4200-F23/A4--Altair_basic_charts.

    Recall our general instructions and policies on GitHub Classroom assignments.

  2. Clone your GitHub-Classroom-generated repository to your local machine.

    E.g., in your terminal / command prompt CD to where you want the folder for this activity to be. Then run: git clone <YOUR_REPO_URL>

  3. CD or open a terminal / command prompt window into the cloned folder.
  4. Create and activate a virtual environment for this project. You may need to modify the code you use depending on what Python you have installed and how your machine is configured. To do so, run the setup commands below.

    • On macOS or Linux, run these three commands separately in case there are errors:

        python3 -m venv env
      
        source env/bin/activate
      
        which python
      
    • On Windows, run these three commands separately in case there are errors:

        python -m venv env
      
        .\env\Scripts\activate.bat
      
        where.exe python
      

    Check the path(s) provided by which python or where.exe python—the first one listed should be inside the env folder you just created.

  5. Install necessary packages.

     python -m pip install -r requirements.txt
    

    This may take a few minutes.

Run Jupyter Lab and create a notebook

  1. Run python -m jupyter lab. It should open Jupyter Lab in your default browser.
  2. Create a new Jupyter Notebook named netflix.ipynb.

Load the dataset

The data we will be using is included in your template repository in the file netflix_titles.csv. It contains the Netflix Movies and TV Shows dataset from Kaggle. This dataset includes all the TV Shows and Movies that have been uploaded on Netflix before January 17, 2020.

  1. In your notebook, load the CSV file.

Create two visualizations and discussions of them to answer questions

  1. Using Altair, create two visualizations to display interesting insights in the Netflix Movies and TV Shows dataset and discussions of them. You will use a Pandas DataFrame. See the documentation on specifying data in Altair.

    Here are some possible questions you can answer:

    • Does the distribution of movie lengths appear gaussian?
    • Who are the directors with the highest average movie rating?
    • Has the rate at which Netflix adds new movies changed through the years?

    Here are the requirements:

    • Each of the two visualizations must be created by a separate code cell in the notebook.
    • Each visualization must be followed by a Markdown cell that explains the visualization, your choices, and the point you are trying to convey.
    • You are free to create any visualization that you like. However, you must explain the reasoning behind your choices, and the visualization must be appropriate for the information you are attempting to learn or convey. (See readings and lecture slides for more details.)

Create an infographic using one (or both) of your visualizations

  1. Export one (or both) of your visualizations as SVG. Recall that Altair lets you save an SVG file directly using the menu in the top-right of a visualization.
  2. Load the SVG file(s) in Inkscape (free and open source and which we showed in class) or in Adobe Illustrator (proprietary but available as a Northeastern student) and create an infographic using them. With your graphic, you should aim to:
    1. better answer one or more of the questions
    2. provide clearer and more appropriate visual encodings (e.g., better labels, fixed placement, etc.)
    3. provide additional contextual information and graphics.

    Note that you should credit the authors of any materials you include and respect the licenses of those materials.

Include your visualizations and infographic in README.md as PNGs

  1. Export both your visualizations and your infographic as PNG files and embed in the README.md file so that they will be displayed when we browse to your GitHub repo. If we are unable to run your notebook, at least we can see the output.

    Please see the GitHub Markdown documentation for how to include an image. Note that Altair lets you save a PNG file directly using the menu in the top-right of a visualization and Inkscape has a PNG export feature.

When you are done…

Optionally clear your output

  1. I used to warn folks to clear the outputs of all cells before commiting .ipynb files. This decreases file size, removes unnecessary metadata, and makes diffs easier to understand. In Jupyter Lab you can use the GUI: Edit->Clear All Outputs. But here, we’re using the jupytext package to automatically create a twin .py file you can run using normal Python! It is much easier to diff.

Quit Jupyter Lab and the virtual environment

  1. Make sure to save your .ipynb file and shutdown Jupyter Lab properly through the file menu. Otherwise, you need to use jupyter notebook stop. ​
  2. Deactivate the venv to return to your terminal using deactivate.

Commit and push your code (but first…)

  1. Only if you have made any changes to the required packages (you probably didn’t), first export a list of all installed packages and their versions:

    pip freeze > requirements.txt
    
  2. Make sure to add all your required files, including the .ipynb file and any PNG and SVG images to the git repo.

  3. Finally, commit all your local files and push them to the remote repository on GitHub.

Submission instructions

  1. Ensure that:

    1. Both visualizations and prose are present in your notebook and both visualizations are present in README.md.
    2. All of your required files including netflix.ipynb and any images are committed and pushed to the remote repository on GitHub which was generated by GitHub Classroom. We will grade based on what is available in that repository.
  2. Submit the URL of your repository to the assignment A4—Altair basic charts in GradeScope.

    Warning: Do not put a link to a personal repository. It must be within our class GitHub organization.

Grading

Criteria Points
Visualization 1 & associated prose 4 pts
Visualization 2 & associated prose 4 pts
Infographic 4 pts
  16 pts

Like usual, the visualizations should follow our the best practices and everything you’ve learned in class up to this point. E.g., include axis labels, appropriate scales, titles, legends, annotations, be neat and clean (not cluttered). Points will be deducted for poor quality or confusing visualizations. Likewise, points will be deducted for spelling and grammar mistakes or not following the directions.

Tips, tricks, and troubleshooting

If you run into trouble, first look at our relevant tutorials which have tips & tricks:


© 2023 Cody Dunne. Released under the CC BY-SA license.