Just A Little More Python

than what I could cover in Just Enough Python

Author
Published

2022-11-22

Warning

This tutorial is under construction.

I wish I had time to cover everything in-depth, but then I’d have to change the title from “Just Enough Python” to “Way More Python Than We Could Possibly Cover In An Hour”. So as a compromise, here’s just a little more Python.

more comprehensions

dictionary comprehensions

Python
movies = ['star wars episode V', 'the godfather', 'the dark knight']
years = [1980, 1972, 2008]
movie_dict = {}
for movie, year in zip(movies, years):
    movie_dict[movie] = year
movie_dict
{'star wars episode V': 1980, 'the godfather': 1972, 'the dark knight': 2008}
Python
movies = ['star wars episode V', 'the godfather', 'the dark knight']
years = [1980, 1972, 2008]
movie_dict = {movie: year for movie, year in zip(movies, years)}

set comprehensions

Use a set instead of a list when order doesn’t matter and you want every item to be unique.

Python
myset = set([1,2,2,2,3,3,4])
print(myset)
{1, 2, 3, 4}
Python
presidents = ["George Washington", "John Adams", "Thomas Jefferson", 
              "James Madison Jr", "James Monroe", "John Quincy Adams"]
first_names = {name.split()[0] for name in presidents}
print(first_names)
{'John', 'George', 'James', 'Thomas'}

Looking up whether a set contains an item is more efficient than using a list.

packages

Using packages works a little differently between R and Python.

In R, you can refer to a function from any package you have installed with double colons:

R script
dplyr::tibble(a = 1:3, b = 4:6)

Alternatively, you can first load the package at the top of your script so you can call the functions you want to use without specifying the package name every time you use them.

R script
library(dplyr)
tibble(a = 1:5, b = 6:10) %>%
    filter(a > 3)

In Python, you must import a package before you can use anything from it and reference the package name like this:

Python script
import pandas
pandas.DataFrame(data = {'a': [1,2,3], 'b': [4,5,6]})

You can give a package a nickname to cut down on the characters you have to type. pandas is often renamed to pd:

Python script
import pandas as pd
DataFrame(data = {'a': [1,2,3], 'b': [4,5,6]})

Alternatively, you can choose to import only the functions you want to use from a package:

Python script
from pandas import DataFrame
DataFrame(data = {'a': [1,2,3], 'b': [4,5,6]})

object-oriented programming

I wish I had enough time to cover object-oriented programming (OOP) in depth, but we’re only going to scratch the surface.

Everything in Python is an object. This is also true in R. You can find out what class an object belongs to in Python with type():

Python
numbers = [1,2,3,4]
type(numbers)
<class 'list'>
name = "Kelly"
type(name)
<class 'str'>
type(name) == str
True

or in R with class():

R
class(1:3)
[1] "integer"
class(list(1, 2, 3))
[1] "list"
name <- "Kelly"
class(name)
[1] "character"
class(name) == 'character'
[1] TRUE

Objects of different classes have different methods and attributes.

  • methods: functions that operate on an object.
  • attributes: metadata associated with an object.

Lists have methods that can modify them in place

Python
print(numbers)
[1, 2, 3, 4]
numbers.append(5)
numbers.reverse()
print(numbers)
[5, 4, 3, 2, 1]

You can split a string into a list based on a separator character with split():

Python
filename = "path/to/file.txt"
filename.split('.')
['path/to/file', 'txt']
filename.split('.')[0]
'path/to/file'

Use join() to do the opposite of split():

Python
column_names = ['patient_id', 'sample_id', 'collection_date']
','.join(column_names)
'patient_id,sample_id,collection_date'

You can even define your own classes! Use the class keyword and define __init__ – the function that initializes a new instance of the class.

Python
class Dog:
    def __init__(self, name):
        self.name = name

    def is_good(self):
        return True

name is an attribute of the Dog class. is_good() is a method of the Dog class. Create a new instance of the Dog class and try using it:

fido = Dog('fido')
print(fido.name)
fido
print(fido.is_good())
True

TODO explain how this works in Snakemake input/output/params etc

Take advantage of object-oriented programming to DRY your code by referencing output files from other rules.