Functional programming¶

part 2 - iterators¶

Iterators¶

The framework of functions as first class citizens already gives us a lot of power, but we also need something to apply those function on.

The most common application of functional programming is sequence manipulation. In particular, these sequences are represented by iterators.

Iterators are something that can be iterated over, i.e. can run a for loop over.

The two biggest defining characteristic of iterators are that they are:

lazy
irreversible

this means that they don't return any values until they are explicitely demanded to do to, and once they returned that value, they don't keep any track of it. This means that they will only be able to be iterated over once.

this seems counterintuitive: I can iterate on a list how many times I want!

In [55]:

my_list = [1, 2]
print("--- first iteration ---")
for element in my_list:
    print(element)
print("--- second iteration ---")
for element in my_list:
    print(element)

--- first iteration ---
1
2
--- second iteration ---
1
2

what is happening is that python does not actually iterate of the list!

Everytime you put the list in a for loop, python generates an iterator for the list, loops over it and discard it.

we can explicitely do the generation ourselves using the iter function

In [137]:

my_list = [1, 2]
my_list_iterator = iter(my_list)

print("--- first iteration ---")
for element in my_list_iterator:
    print(element)
print("--- second iteration ---")
for element in my_list_iterator:
    print(element)

--- first iteration ---
1
2
--- second iteration ---

this is the default behavior for files, for example: once you iterate over them, you have to reset them or they will not iterate anymore!

to reset it we can use the file.seek(0) function.

but file-reading is an exception, resetting the sequence is normally not possible or desirable

In [146]:

%%file temp.txt
tuna
mayo
bread

Overwriting temp.txt

In [147]:

with open("temp.txt") as infile:
    for line in infile:
        print(line.strip())
    print('-'*20)
    for line in infile:
        print(line.strip())

tuna
mayo
bread
--------------------

In [148]:

with open("temp.txt") as infile:
    for line in infile:
        print(line.strip())
    print('-'*20)
    infile.seek(0)
    for line in infile:
        print(line.strip())

tuna
mayo
bread
--------------------
tuna
mayo
bread

we can control the advancement of our iterator by using the function next

In [151]:

my_list = [1, 2]
my_list_iterator = iter(my_list)

In [152]:

next(my_list_iterator)

Out[152]:

In [153]:

next(my_list_iterator)

Out[153]:

In [154]:

next(my_list_iterator)

---------------------------------------------------------------------------
StopIteration                             Traceback (most recent call last)
<ipython-input-154-19e354f49d7b> in <module>
----> 1 next(my_list_iterator)

StopIteration:

The StopIteration is the signal that the iteration is ended. We could in theory implement an artigianal version of the for loop using an infinite while cycle:

In [157]:

mylist = [1, 2, 3]
for element in mylist:
    print(element)

1
2
3

In [158]:

iterator = iter(mylist)
while True:
    try:
        element = next(iterator)
        print(element)
    except StopIteration:
        break

1
2
3

the next method also have a possible default to be returned if the sequence is terminated

In [17]:

my_list = [1, 2]
my_list_iterator = iter(my_list)
assert next(my_list_iterator)==1
assert next(my_list_iterator)==2
next(my_list_iterator, "end of iteration")

Out[17]:

'end of iteration'

Of course this is not a very efficient way of doing it, but it helps to clarify what is going on under the hood.

The most important property of the iterators is that, by combining lazyness and irreversibility, they can potentially represent huge sequences (or even infinite) without any issues.

After all, unless you ask to see all the elements, but only some of them, the iterator is not going to be fazed by it.

This is not hypotetical: in the itertools library there is a function count that literally counts from 0 to infinity!

One of the most common iterators that we used is the range function, as well as the enumerate one.

In particular enumerate is a function that takes an iterator as input and returns anoter iterator as output.

This is very similar to what we seen when we discussed function manipulation.

The python structure to implement this function while maintaing the lazy and lightweight nature of the iterators are the generators

Generators¶

Generators are special function that allow to perform lazy iteration

when called they create a generator object, but do not execute anything (more or less, see later).

the proper execution starts when one iterates over them.

for each call to next done to them they return one value, pause their execution and wait to be called again.

In a way they are half-way between a function and an object

for more in-detail information, read: https://www.dabeaz.com/generators/

The only difference between a generator and a normal function is the usage of yield instead of return to give back the result of the function.

using return inside a generator is possible but nothing will happen and will break the flow. Just don't do it.

In [50]:

def my_generator():
    yield 1
    yield 4
    yield 9
    
for i in my_generator():
    print(i)

1
4
9

In [51]:

iterable = my_generator()
print(next(iterable))
print(next(iterable))
print(next(iterable))

1
4
9

In [18]:

# they don't execute until iterated over!
def my_generator():
    print("prepare the state. this does not execute until the first `next`")
    yield 1
    yield 4
    yield 9
    
gen = my_generator()

In [56]:

next(gen)

prepare the state. this does not execute until the first `next`

Out[56]:

There are two main uses for generators:

generate a new sequence of data from a starting value (for example iterating all the numbers up to a value in range)
take a sequence and generate a new one performing some operation on it (for example adding an index such as enumerate)

In [49]:

# generate an infinite sequence
def fibonacci():
    a, b = 1, 1
    while True:
        yield a
        a, b = b, a+b
        
sequence = fibonacci()
numbers = [next(sequence) for i in range(8)]
numbers

Out[49]:

[1, 1, 2, 3, 5, 8, 13, 21]

In [3]:

# modify an existing sequence
def square(sequence):
    for element in sequence:
        yield element**2
        
data = [1, 2, 3, 4]
list(square(data))

Out[3]:

[1, 4, 9, 16]

`yield from` - concatenating generators¶

In [19]:

def _cat(seq_1, seq_2):
    for element in seq_1:
        yield element
    for element in seq_2:
        yield element

def cat(seq_1, seq_2):
    yield from seq_1
    yield from seq_2
    
data_1 = [1, 2, 3]
data_2 = [4, 5, 6]
list(cat(data_1, data_2))

Out[19]:

[1, 2, 3, 4, 5, 6]

In [21]:

def cat(*sequences):
    for sequence in sequences:
        yield from sequence
        
data_1 = [1, 2, 3]
data_2 = [4, 5, 6]
list(cat(data_1, data_2))

Out[21]:

[1, 2, 3, 4, 5, 6]

generator comprehension¶

In [12]:

def _(seq):
    for element in seq:
        yield element**2

data = (x**2 for x in [1, 2, 3])
print(data)

<generator object <genexpr> at 0x7fbd1c1c8678>

In [13]:

list(data)

Out[13]:

[1, 4, 9]

as a matter of facts, list, dict and set coprehensions can be seen as a particular case of generator comprehension that also concretize the generator in a data structure!

the advantage of lazy evaluation¶

one does not need to evaluate all the results of a computation, but only the one they need. This can be important when calculating the items is expensive.

consider this kind of code:

any([item.is_positive() for item in collection])

this will estimate the is_positive function for all the items in the collection, even if our function only need to find one positive result to return. All the other values after that are useless!

using a generator expression we can avoid all the function call after the first positive result...and it's also shorter!

any(item.is_positive() for item in collection)

side note - generators and context managers¶

generators are a very useful tool to generate context managers, using the contextmanager decorator from the contextlib module.

This decorator takes a generator as an input and returns a context manager as an output

In [23]:

from contextlib import contextmanager

@contextmanager
def test_manager(name):
    print(f"hello {name}, entering the manager")
    yield
    print(f"hello {name}, exiting the manager")
    
with test_manager("Franco"):
    print("I'm inside the manager")

hello Franco, entering the manager
I'm inside the manager
hello Franco, exiting the manager

context managers are an extremely common structure in programming. They are useful any time one has to guarantee that something will happen before and after a piece of code. This is typical of:

change the status of the program and then revert it back
get a resource, do something with it and then release it (locks in parallel programming)

In [ ]:

@contextmanager
def duration():
    from datetime import datetime
    import logging
    start = datetime.now()
    try:
        yield
    finally:
        end = datetime.now()
        logging.warning(end-start)

In [27]:

with duration():
    import time
    time.sleep(0.5)
    a = [i for i in range(1000)]

WARNING:root:0:00:00.500756

In [ ]:

@contextmanager
def print_as_log():
    import logging
    _old_print = print
    globals()['print'] = logging.warning
    try:
        yield
    finally:
        globals()['print'] = _old_print

In [28]:

with print_as_log():
    print("ciao mondo!")

WARNING:root:ciao mondo!

In [ ]:

@contextmanager
def get_sorted_list():
    result = []
    yield result
    result.sort()

In [29]:

with get_sorted_list() as mylist:
    mylist.append(3)
    mylist.append(1)
    
print(mylist)

[1, 3]

In [ ]:

@contextmanager
def ignore_exceptions(*exception_classes):
    try:
        yield
    except exception_classes:
        pass

In [37]:

with ignore_exceptions(IndexError, ValueError):
    a = [1, 2]
    a[4]

In [38]:

with ignore_exceptions(IndexError):
    a = 1 + "2"

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-38-792d97826bdf> in <module>
      1 with ignore_exceptions(IndexError):
----> 2     a = 1 + "2"
      3 

TypeError: unsupported operand type(s) for +: 'int' and 'str'

In [39]:

with ignore_exceptions(IndexError, TypeError):
    a = 1 + "2"

Map¶

A very common operation on sequences is the mapping operation, that consist in applying a function to all the elements of an iterable and returning them one at the time

For example, I might want to obtain the square of every number in a sequence

In [180]:

numbers = [0, 1, 2, 3, 4, 5, 6]
squares = []
for number in numbers:
    square = number **2
    squares.append(square)
    
print(squares)

[0, 1, 4, 9, 16, 25, 36]

This can be written in a shorter and more readable way using a generator comprehension, that is has the same behavior as the code above.

In [27]:

numbers = [0, 1, 2, 3, 4, 5, 6]
squares = (x**2 for x in numbers)
print(list(squares))

[0, 1, 4, 9, 16, 25, 36]

The concept of map is an abstraction of this idea.

Python provide a function, called map, that takes a function and an iterator and returns a new iterator that have as elements the function applied to each element of the original sequence

In [185]:

def square(n):
    return n**2

numbers = [0, 1, 2, 3, 4, 5, 6]
squares = map(square, numbers)
print(squares)

<map object at 0x7fbd0e6e6e80>

We have to remember that the result is an iterator, and as such, we don't obtain any result until we explicitely ask for them

If we want to see the results we can concretize them in a container such as list

In [186]:

list(squares)

Out[186]:

[0, 1, 4, 9, 16, 25, 36]

We should also remember that iterators are irreversible, so we can only iterate on them once.

This means that we can only create a list out of them once.

In [188]:

list(squares)

Out[188]:

[]

The basic implementation of map as a generator is the following:

In [3]:

def _map(function, sequence):
    for item in sequence:
        yield function(sequence)

the builtin map function has an additional ability: if the function passed takes many parameters, and one passes several iterators to it, it will return the sequence of the function applied to all the tuples of items composed from the various sequences.

In [4]:

def _add(a, b):
    return a+b

list(map(_add, [1, 2, 3], [4, 5, 6]))

Out[4]:

[5, 7, 9]

so the complete implementation of map would be something like:

In [7]:

def _map(function, *sequences):
    for items in zip(*sequences):
        yield function(*items)

In [8]:

list(_map(_add, [1, 2, 3], [4, 5, 6]))

Out[8]:

[5, 7, 9]

We could implement an alternative version of map, that takes named arguments instead of just ordered ones.

for each named parameter we get a sequence, so that we can call the function as:

def _add(a, b):
    return a+b

list(_map(_add, b=[1, 2, 3], a=[4, 5, 6]))

to be able to use this we have to pass a dictionary splitting the lists into a series of dictionaries

In [56]:

kwargs = {'a': [1, 2, 3], 'b': [4, 5, 6]}
names = kwargs.keys()
values = kwargs.values()

In [63]:

first = next(zip(*values))
first

Out[63]:

(1, 4)

In [65]:

list(zip(names, first))

Out[65]:

[('a', 1), ('b', 4)]

In [68]:

values_pairs = list(zip(*values))
values_pairs

Out[68]:

[(1, 4), (2, 5), (3, 6)]

In [71]:

couples = [list(zip(names, value)) for value in values_pairs]
couples

Out[71]:

[[('a', 1), ('b', 4)], [('a', 2), ('b', 5)], [('a', 3), ('b', 6)]]

In [72]:

[dict(c) for c in couples]

Out[72]:

[{'a': 1, 'b': 4}, {'a': 2, 'b': 5}, {'a': 3, 'b': 6}]

In [80]:

def _map(function, **kw_seq):
    names = kw_seq.keys()
    kw_sequences = kw_seq.values()
    kwargs_seq = zip(*kw_sequences)
    kw_seq = (dict(zip(names, values)) for values in kwargs_seq)
    for kwarg in kw_seq:
        yield function(**kwarg)

In [82]:

def _add(a, b):
    return a-b

list(_map(_add, b=[1, 2, 3], a=[4, 5, 6]))

Out[82]:

[3, 3, 3]

In [83]:

from itertools import repeat

def _add_pow(a, b, p):
    return a-b**p

list(_map(_add_pow, b=[1, 2, 3], a=[4, 5, 6], p=repeat(2)))

Out[83]:

[3, 1, -3]

Filter¶

Filter uses a similar logic, but instead of applying a function, it selects only those elements in the sequence where the property is true.

this is implemented by the filter function in base python.

In [1]:

numbers = [-2, -1, 0, 1, 2]
positive = []
for number in numbers:
    if number>0:
        positive.append(number)

print(positive)

[1, 2]

In a similar way to the map operation, the filter can be implemented as a comprehension

In [28]:

numbers = [-2, -1, 0, 1, 2]
positive = (x for x in numbers if x>0)
print(list(positive))

[1, 2]

It is also possible to combine the map and filter operation in a comprehension, obtaining the full expression for the generator comprehension!

In [29]:

numbers = [-2, -1, 0, 1, 2]
squares_of_positive = (x**2 for x in numbers if x>0)
print(list(squares_of_positive))

[1, 4]

Exactly as before, we have a function filter that takes an iterator and a function that returns true or false (keep the element or drop it)

In [60]:

def is_positive(n):
    return n>0

positivi = filter(is_positive, numbers)
print(list(positivi))

[1, 2]

From a formal perspective a filter function should be a pure function, so it should not depend on a state.

using generators it is possible to implement less pure, but sometimes more useful, type of filtering.

In [49]:

# keep only the incremental values, dropping those that diminish
def incremental(sequence):
    last = None
    for item in sequence:
        if last is None or item>=last:
            yield item
            last = item

data = [1, 2, 3, 1, 4, 2, 5]
list(incremental(data))

Out[49]:

[1, 2, 3, 4, 5]

In [1]:

# we can generalize by returning the pairs of elements
def pairwise(seq):
    seq = iter(seq)
    previous = next(seq)
    for element in seq:
        yield (previous, element)
        previous = element

data = [1, 2, 3, 1, 4, 2, 5]
pairs = list(pairwise(data))
print(pairs)
# return only the values where the first value is greater than the one that follows
result = (p[1] for p in pairs if p[1]>=p[0])
list(result)

[(1, 2), (2, 3), (3, 1), (1, 4), (4, 2), (2, 5)]

Out[1]:

[2, 3, 4, 5]

notes on parameter ordering¶

Sometimes it might seem weird the choice of parameter ordering, such as using the function as first argument rather than as last.

This is often due to the desire to combine this functions with partial applications and other functional tools such as those seen in the previous lessons.

In [51]:

def is_positive(x):
    return x>0

def square(x):
    return x**2

data = [-1, 2, -3, 4]
result = map(square, filter(is_positive, data))
print(list(result))

[4, 16]

In [52]:

from functools import partial

only_positive = partial(filter, is_positive)
get_squares = partial(map, square)

result = get_squares(only_positive(data))
print(list(result))

[4, 16]

Reduce¶

Finally, after we transform and filter the data (and combine them in various ways), we can perform an operation of reduction.

Reduction operations take the element one by one and combine them in a single "pot".

One simple example is to sum all the values thta are observed divided by group.

In [30]:

numbers = [1, 2, 3, 4]
total = 0
for number in numbers:
    total += number
    
print(total)

as before, there is a function that implements it given a sequence and a combination function.

the only difference is that is not in the builtins but in the functools

In [11]:

from functools import reduce

def add(a, b):
    return a+b

numbers = [1, 2, 3, 4]
total = reduce(add, numbers, 0)
print(total)

These kind of operations are so common that the most common one are already implemented as default functions:

sum for summing
min and max for estimating the minimum and maximum of a sequence
all and any for logical operations

A typical reduction that is often used, is frequency estimation

In [9]:

from collections import Counter
numbers = [1, 1, 1, 2, 2, 3, 3, 4, 4, 4, 4, 4]
Counter(numbers)

Out[9]:

Counter({1: 3, 2: 2, 3: 2, 4: 5})

The interesting property of the reductions is that the results can be combined: given the counts of two series, I can sum the counts and obtain the result of the counts in the combinations of the two original series.

For example, in the case of Counter, various Counters can be summed together.

The formal definition is the following:

DATA:

Array of Type B

DEFAULT:

Type A (can be equal to B)

FUNCTION:

Type A + Type B -> Type A
Type A + Type A -> Type A

In [95]:

def double(x): return 2*x
def square(x): return x**2

def apply_function(number, function):
    return function(number)

reduce(apply_function, [double, square, double], 2)

Out[95]:

In [96]:

def generate_pipe(*functions):
    def execute(value):
        return reduce(apply_function, functions, value)
    return execute

pipe = generate_pipe(double, square, double)
pipe(2)

Out[96]:

Map Reduce¶

The famous MAP-REDUCE approach is a combination of these approaches:

take a sequence, divide it into sub-sequences
divide them across several servers
perform the transformation and reduction on each server independently
collect the partial result of each server and combine them
recursively combine them all until the final result is obtained

Itertools - functions to manipulate iterators¶

python provides few useful functions to manipulate iterators at the builtins level, namely:

map
filter
reducers (sum, max, min, all, any)
range
enumerate
zip

but it also provides plenty of other functions, really useful, in the itertools module

Infinite iterators:

Iterator	Arguments	Results	Example
`count()`	start, [step]	start, start+step, start+2*step, …	`count(10) --> 10 11 12 13 14 ...`
`cycle()`	p	p0, p1, … plast, p0, p1, …	`cycle('ABCD') --> A B C D A B C D ...`
`repeat()`	elem [,n]	elem, elem, elem, … endlessly or up to n times	`repeat(10, 3) --> 10 10 10`

Iterators terminating on the shortest input sequence:

Iterator	Arguments	Results	Example
`accumulate()`	p [,func]	p0, p0+p1, p0+p1+p2, …	`accumulate([1,2,3,4,5]) --> 1 3 6 10 15`
`chain()`	p, q, …	p0, p1, … plast, q0, q1, …	`chain('ABC', 'DEF') --> A B C D E F`
`chain.from_iterable()`	iterable	p0, p1, … plast, q0, q1, …	`chain.from_iterable(['ABC', 'DEF']) --> A B C D E F`
`compress()`	data, selectors	(d[0] if s[0]), (d[1] if s[1]), …	`compress('ABCDEF', [1,0,1,0,1,1]) --> A C E F`
`dropwhile()`	pred, seq	seq[n], seq[n+1], starting when pred fails	`dropwhile(lambda x: x<5, [1,4,6,4,1]) --> 6 4 1`
`filterfalse()`	pred, seq	elements of seq where pred(elem) is false	`filterfalse(lambda x: x%2, range(10)) --> 0 2 4 6 8`
`groupby()`	iterable[, key]	sub-iterators grouped by value of key(v)
`islice()`	seq, [start,] stop [, step]	elements from seq[start:stop:step]	`islice('ABCDEFG', 2, None) --> C D E F G`
`starmap()`	func, seq	func(seq[0]), func(seq[1]), …	`starmap(pow, [(2,5), (3,2), (10,3)]) --> 32 9 1000`
`takewhile()`	pred, seq	seq[0], seq[1], until pred fails	`takewhile(lambda x: x<5, [1,4,6,4,1]) --> 1 4`
`tee()`	it, n	it1, it2, … itn splits one iterator into n
`zip_longest()`	p, q, …	(p[0], q[0]), (p[1], q[1]), …	`zip_longest('ABCD', 'xy', fillvalue='-') --> Ax By C- D-`

Combinatoric iterators:

Iterator	Arguments	Results
`product()`	p, q, … [repeat=1]	cartesian product, equivalent to a nested for-loop
`permutations()`	p[, r]	r-length tuples, all possible orderings, no repeated elements
`combinations()`	p, r	r-length tuples, in sorted order, no repeated elements
`combinations_with_replacement()`	p, r	r-length tuples, in sorted order, with repeated elements
`product('ABCD', repeat=2)`		`AA AB AC AD BA BB BC BD CA CB CC CD DA DB DC DD`
`permutations('ABCD', 2)`		`AB AC AD BA BC BD CA CB CD DA DB DC`
`combinations('ABCD', 2)`		`AB AC AD BC BD CD`
`combinations_with_replacement('ABCD', 2)`		`AA AB AC AD BB BC BD CC CD DD`

short note - parallelism with multiprocessing¶

The main way to implement true parallelism in python right now is to use the multiprocessing module.

This module implements many ways of working with parallel processes.

The one we are interested about is Pool.map, that implementes a parallel version of the map function.

In [54]:

import multiprocessing as mp

data = range(5)
def squares(x):
    return x**2

with mp.Pool(processes=4) as pool:
    result = pool.map(squares, data)
result

Out[54]:

[0, 1, 4, 9, 16]

An example on how to distribute a list of string and tranform them in lowercase in a parallel fashion

In [103]:

_missing = object()
def grouper(iterable, n, fillvalue=_missing):
    "Collect data into fixed-length chunks or blocks"
    # grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx"
    args = [iter(iterable)] * n
    sets = zip_longest(*args, fillvalue=_missing)
    return  ([i for i in s if i is not _missing] for s in sets)

list(grouper('ABCDEFG', 3))

Out[103]:

[['A', 'B', 'C'], ['D', 'E', 'F'], ['G']]

In [110]:

import itertools as it
sequence = iter('ABCDEFG')
def mapl(f, *seqs):
    return list(map(f, *seqs))

lower = partial(mapl, str.lower)
groups = grouper(sequence, 3)
with mp.Pool(processes=4) as pool:
    result = pool.map(lower, groups)
    result = list(it.chain.from_iterable(result))
result

Out[110]:

['a', 'b', 'c', 'd', 'e', 'f', 'g']

Problem - counting file types¶

scan all the files in a directory in your computer, including the subdirectories, and count the number of files of each extension type.

The extension to be considered is the one after the last dot, lowercase.

The count should be divided by subdirectory and the overall total.

If possible, implement it in parallel.

suggestion: check the os.walk function to explore the filesystem from a starting point.

In [127]:

import os
import itertools as it
from collections import Counter

def get_extension(fname):
    return fname.split('.')[-1]

def is_image(ext):
    return ext in images_extension

images_extension = ['png', 'jpg', 'svg']

flatten_lists = it.chain.from_iterable
make_lowercase = partial(map, str.lower)
extract_extensions = partial(map, get_extension)
select_images = partial(filter, is_image)

files_seq = (files for _, _, files in os.walk('.'))
files = flatten_lists(files_seq)
files = make_lowercase(files)
extension = extract_extensions(files)
images = select_images(extension)
Counter(images)

Out[127]:

Counter({'png': 45, 'svg': 7})

In [119]:

from pathlib import Path

def _is_hidden_path(path):
    return path.absolute().stem.startswith('.')

def my_walk(position, *, remove_hidden=True):
    """generator reimplementing the os.walk function using the `yield from` syntax"""
    # convert the parameter to make sure it's a path object
    position = Path(position)
    # gets all the files and directories in the current path
    paths = list(position.iterdir())
    # select the files and folders
    sub_files = [path for path in paths if not path.is_dir()]
    sub_dirs = [path for path in paths if path.is_dir()]
    if remove_hidden:
        sub_files = [path for path in sub_files if not _is_hidden_path(path)]
        sub_dirs = [path for path in sub_dirs if not _is_hidden_path(path)]
    # returns the current position, subdirectories and files
    yield position, sub_dirs, sub_files
    # returns all the content of all the subdirectories
    for sub_dir in sub_dirs:
        yield from my_walk(sub_dir)

In [120]:

for pos, dirs, files in my_walk('immagini'):
    print(pos)
    print([d.name for d in dirs])
    print([f.name for f in files])
    print()

immagini
['property']
['interfaccia_fossil_3.png', 'power_triangle.png', 'interfaccia_fossil_2.png', 'snakemake_dag.svg', 'interfaccia_fossil_7.png', 'tradeoff_spazio_velocita.svg', 'interfaccia_fossil_8.png', 'interfaccia_fossil_10.png', 'time.png', 'net.png', 'time2.png', 'interfaccia_fossil_4.png', 'FastPCA.png', 'tradeoff_spazio_velocita.png', 'interfaccia_fossil_9.png', 'interfaccia_fossil_5.png', 'vec.png', 'realtime.png', 'interfaccia_fossil_6.png', 'mapreduce.png', 'interfaccia_fossil_1.png', 'snakemake_dag.png', '5v.png', 'google.png']

immagini/property
[]
['property_commutative.png', 'property_induction.png', 'property_invariant.png', 'property_test_oracle.png', 'property_inverse.png', 'property_easy_verification.png', 'property_idempotence.png']

Dynamical code Generation and execution¶

In [80]:

a = 4
b = 7
eval('a+b')

Out[80]:

In [53]:

a = {'a':1, 'b':2}
result = eval("a + b", globals(), a)
print(a, result)

{'a': 1, 'b': 2} 3

In [79]:

a = {'a':1, 'b':2}
exec("c = a+b", globals(), a)
a

Out[79]:

{'a': 1, 'b': 2, 'c': 3}

In [2]:

expr = compile("a+b", filename='__main__', mode='eval')
expr

Out[2]:

<code object <module> at 0x7f8ac41a2e40, file "__main__", line 1>

In [5]:

a = {'a':1, 'b':2}
eval(expr, globals(), a)

Out[5]:

In [6]:

Out[6]:

{'a': 1, 'b': 2}

AST¶

https://greentreesnakes.readthedocs.io/en/latest/

In [22]:

import ast
commands = """
a = 2+3
b = 4
print(a+b)
"""
tree = ast.parse(commands)
print(tree)

<_ast.Module object at 0x7f0988910390>

In [26]:

# expect the value 9
compiled_expr = compile(tree, filename="<ast>", mode="exec")
exec(compiled_expr)

In [24]:

def walker(branch, offset=0, indent='\t'):
    for child in ast.iter_child_nodes(branch):
        print(indent*offset, child)
        walker(child, offset=offset+1, indent=indent)

In [25]:

walker(tree, indent='  ')

 <_ast.Assign object at 0x7f0988910400>
   <_ast.Name object at 0x7f0988910470>
     <_ast.Store object at 0x7f098f1c4dd8>
   <_ast.BinOp object at 0x7f09889103c8>
     <_ast.Num object at 0x7f0988910550>
     <_ast.Add object at 0x7f098f1c7be0>
     <_ast.Num object at 0x7f09889104a8>
 <_ast.Assign object at 0x7f09889101d0>
   <_ast.Name object at 0x7f0988910780>
     <_ast.Store object at 0x7f098f1c4dd8>
   <_ast.Num object at 0x7f0988910438>
 <_ast.Expr object at 0x7f09889104e0>
   <_ast.Call object at 0x7f0988910b00>
     <_ast.Name object at 0x7f0988910630>
       <_ast.Load object at 0x7f098f1c4cc0>
     <_ast.BinOp object at 0x7f09889105f8>
       <_ast.Name object at 0x7f0988910940>
         <_ast.Load object at 0x7f098f1c4cc0>
       <_ast.Add object at 0x7f098f1c7be0>
       <_ast.Name object at 0x7f09889108d0>
         <_ast.Load object at 0x7f098f1c4cc0>

Functional programming¶

part 2 - iterators¶

Iterators¶

Generators¶

yield from - concatenating generators¶

generator comprehension¶

the advantage of lazy evaluation¶

side note - generators and context managers¶

Map¶

Filter¶

notes on parameter ordering¶

Reduce¶

Map Reduce¶

Itertools - functions to manipulate iterators¶

short note - parallelism with multiprocessing¶

Problem - counting file types¶

Dynamical code Generation and execution¶

AST¶

`yield from` - concatenating generators¶