python

Random Pandas Notes, Note 2

2019-11-21

Reference: https://www.mikulskibartosz.name/how-to-split-a-list-inside-a-dataframe-cell-into-rows-in-pandas/

Given a table containing a column of lists, like:

id	name	tags
1	winter squash	[‘60-minutes’, ‘time-to-make’, ‘healthy’]
2	braised pork	[‘2-hours-more’, ‘time-to-make’, ‘chinese’]
3	chilli Beef	[‘4-hours-more’, ‘chinese’]

Like the ‘unwind’ function in mongodb, to turn it into:

id	name	tags
1	winter squash	‘60-minutes’
1	winter squash	‘time-to-make’
1	winter squash	‘healthy’
2	braised pork	‘2-hours-more’
2	braised pork	‘time-to-make’
2	braised pork	‘chinese’
3	chilli Beef	‘4-hours-more’
3	chilli Beef	‘chinese’

Here’s who we do it:

tags_df['tags'] = tags_df.tags.apply(lambda x: x[1:-1].split(','))
clean_tag_df = tags_df.tags.apply(pd.Series).merge(tags_df, right_index = True, left_index = True) \
    .drop(["tags"], axis = 1) \
    .melt(id_vars = ['recipe_id'], value_name = "tags") \
    .drop("variable", axis = 1).dropna()

Breaking it down, we have the following to turn the tags from a string into a list of strings.

1	tags_df['tags'] = tags_df.tags.apply(lambda x: x[1:-1].split(','))

Next step, turning a list of tags into multiple columns:

1	tags_df.tags.apply(pd.Series)

It turns the chart like:

1	2	3
‘60-minutes’	‘time-to-make’	‘healthy’
‘2-hours-more’	‘time-to-make’	‘chinese’
‘4-hours-more’	‘chinese’	NaN

Then, we join tags with the rest of the list:

1	prev_df.merge(tags_df, right_index = True, left_index = True)

Then, we drop the duplicated “tags” column and unwind different columns of the tags into different rows:

1	prev_df.drop(["tags"], axis = 1).melt(id_vars = ['recipe_id'], value_name = "tags")

Lastly, we remove the “variable” column, which we might not need:

1	prev_df.drop("variable", axis = 1).dropna()

Random Pandas Notes, Note 1

2019-10-19

To change a row value based on more than one conditions, use the following:

1	df['A'] = np.where(((df['B'] == 'some_value') & (df['C'] == 'some_other_value')), true_value, false_value)

Another equally efficient approach is using loc:
Note that this method does not give a default value when the condition is not met. So, the same code might be required to run twice or using .fillna() method.

1	df.loc[(df.B == 'some_value') \| (df.C == 'some_other_value'), 'A'] = true_value

ps: for ‘and’ or ‘or’ operation, use symbol ‘&’ or ‘|’.

A Brief Summary of Sorting Algorithm, Part 1

2019-10-13

Basics:
- in-place sorting vs. out-place sorting
- internal sorting vs. external sorting
- stable vs. unstable sorting
Bubble Sort
Selection Sort
Insertion Sort
Merge Sort
Quick Sort
Heap Sort

Internal vs. external sorting

Internal sorting and external sorting describes where the sorting occurs:

internal sorting located entirely in memory
external sorting utilizes hard disk and external storage

Stable vs. unstable sorting

A sorting algorithm is said to be stable if two objects with equal keys appear in the same order in sorted output as they appear in the input array to be sorted.

stable sorting algorithms includes:
1. Bubble Sort
2. Insertion Sort
3. Merge Sort
4. Count Sort

Bubble Sort

Bubble Sort is a type of stable sorting algorithm. The algorithm compares two elements that are next to each other and swap two element is the left one is larger than the right one. Time complexity of bubble sort is O(n*n).

def bubbleSort(arr): 
    n = len(arr) 
    for i in range(n): 
        for j in range(0, n-i-1): 
            if arr[j] > arr[j+1] : 
                arr[j], arr[j+1] = arr[j+1], arr[j] 
    return arr

Selection Sort

In every iteration of selection sort, the minimum element (considering ascending order) from the unsorted subarray is picked and moved to the sorted sub-array. Selection sort can be done stably. Time complexity of selection sort is O(n*n).

def selectionSort(arr):
    for i in range(len(arr)): 
        min_idx = i 
        for j in range(i+1, len(arr)): 
            if arr[min_idx] > arr[j]: 
                min_idx = j 
        tmp = arr[i]
        arr[i] = arr[min_idx]
        arr[min_idx] = tmp
    return arr

Insertion Sort

Insertion sort is stable. In every iteration of insertion sort, the first element is selected and inserted into the correct location in the sorted half of the array. Time complexity of Insertion sort is O(n*n).

def insertionSort(arr): 
    for i in range(1, len(arr)): 
        key = arr[i] 
        j = i-1
        while j >= 0 and key < arr[j] : 
                arr[j + 1] = arr[j] 
                j -= 1
        arr[j + 1] = key 
    return arr

Merge Sort

Merge sort is stable. Merge Sort is a Divide and Conquer algorithm. It divides input array in two halves, calls itself for the two halves and then merges the two sorted halves. Time complexity is O(n*log(n)).

def mergeSort(arr): 
    if len(arr) >1: 
        mid = len(arr)//2
        L = mergeSort(arr[:mid])
        R = mergeSort(arr[mid:])
        i = j = k = 0
        while i < len(L) and j < len(R): 
            if L[i] < R[j]: 
                arr[k] = L[i] 
                i+=1
            else: 
                arr[k] = R[j] 
                j+=1
            k+=1
        while i < len(L): 
            arr[k] = L[i] 
            i+=1
            k+=1
        while j < len(R): 
            arr[k] = R[j] 
            j+=1
            k+=1
    return arr

Quick Sort

For quick sort, we pick a random element as pivot. Compare each element with the pivot to create first half of the list smaller than the pivot and the second half larger than the pivot. After that, quick sort divide conquer two halves. Time complexity for quick sort is O(nlog(n)), worst case is O(nn). Quick sort can be made stable.

def partition(arr,low,high): 
    i = (low-1)
    pivot = arr[high]
    for j in range(low , high): 
        if   arr[j] < pivot: 
            i = i+1 
            arr[i],arr[j] = arr[j],arr[i] 
    arr[i+1],arr[high] = arr[high],arr[i+1] 
    return ( i+1 ) 

def quickSort(arr,low,high): 
    if low < high: 
        pi = partition(arr,low,high) 
        quickSort(arr, low, pi-1) 
        quickSort(arr, pi+1, high)

Heap Sort

Build a max heap from the input data.
At this point, the largest item is stored at the root of the heap. Replace it with the last item of the heap followed by reducing the size of heap by 1. Finally, heapify the root of tree.

Repeat above steps while size of heap is greater than 1.
Heapify: this procedure calls itself recursively to move the max value to the top of the heap. Time complexity for heapify is O(log(n)) and time complexity for building a heap is O(n). Thus, heap sort gives the overall time complexity as O(n*log(n)).

def heapify(arr, n, i): 
    largest = i # Initialize largest as root 
    l = 2 * i + 1     # left = 2*i + 1 
    r = 2 * i + 2     # right = 2*i + 2 
    # left child  
    if l < n and arr[i] < arr[l]: 
        largest = l 
    # right child
    if r < n and arr[largest] < arr[r]: 
        largest = r 
    # change the root 
    if largest != i: 
        arr[i],arr[largest] = arr[largest],arr[i] # swap 
        # Heapify upward 
        heapify(arr, n, largest) 
  
def heapSort(arr): 
    n = len(arr) 
    for i in range(n, -1, -1): 
        heapify(arr, n, i) 
    for i in range(n-1, 0, -1): 
        arr[i], arr[0] = arr[0], arr[i] # swap 
        heapify(arr, i, 0) 
    return a

Django Basics, Part 1

2019-01-20

Django directory levels

manage.py: we use this to communicate with Django server
project directory:
1. __init__.py: tells Django to use directory as a python package
2. settings.py: Django configuration file
3. urls.py: url schema
4. wsgi.py: project and wsgi integration web server

Database (MySQL) config

dependencies used:
mysql 5.7.24 h56848d4_0
mysql-connector-c 6.1.11 hccea1a4_0
mysql-connector-python 8.0.17 py27h3febbb0_0 anaconda
mysql-python 1.2.5 py27h1de35cc_0 anaconda
pymysql 0.9.3 py27_0

__init__.py:, add the following code:

1 2	import pymysql pymysql.install_as_MySQLdb()

in settings.py: add database configurations to connect to the database

DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.mysql',
        'NAME': # server name
        'USER': # user name
        'PASSWORD': # password
        'HOST': 'localhost',
        'PORT': '3306',
    }
}

Init new application

init new application:
1
python manage.py startapp <myApp>
codes above creates:
- admin.py: website config
- models.py: model
- views.py: view

In setting, add to INSTALLED_APP:

INSTALLED_APPS = [
    'django.contrib.admin',
    'django.contrib.auth',
    'django.contrib.contenttypes',
    'django.contrib.sessions',
    'django.contrib.messages',
    'django.contrib.staticfiles',
    'myApp'
]

models.py file

the following is an example of models.py

# -*- coding: utf-8 -*-
from __future__ import unicode_literals
from django.db import models

# Create your models here.
class Grades(models.Model):
    gname = models.CharField(max_length=20)
    gdate = models.DateTimeField()
    ggirlnum = models.IntegerField()
    gboynum = models.IntegerField()
    isDelete = models.BooleanField(default=False)
    
    def __str__(self):
        return "%s-%d-%d"%(self.gname, self.gboynum, self.ggrilnum)


class Students(models.Model):
    sname = models.CharField(max_length=20)
    sgender = models.BooleanField(default=True)
    sage = models.IntegerField()
    scontend = models.CharField(max_length=20)
    isDelete = models.BooleanField(default=False)
    sgrade = models.ForeignKey("Grades")
    
    def __str__(self):
            return "%s-%d-%s"%(self.sname, self.sage, self.scontend)

make migration and execute migration file: will create database according to models.py

1 2	python manage.py makemigrations # create migration file python manage.py migrate # execute migration file

To add new entries with manage.py shell commands

enter shell:
1
python manage.py shell

import packages in shell:

1
2
3

from myApp.models import Grades, Students
from django.utils import timezone
from datetime import *

add new entry to db:

grade1 = Grades()
grade1.gname = "something"
grade1.gdate = datetime(year=,month=,day=)
grade1.ggirlnum = 70
grade1.gboynum = 35
grade1.save()

some other command:

Grades.objects.all()    # get all data entries in Grades
g = Grades.objects.get(pk=1)    # get the 1st object in Grades
g.gboynum = 45
g.save()    # alternate existing data
d.delete()  # delete, including the one in db
stu1 = g.students_set.create(sname=...)     # add new student with one line

To run server

execute:
1
python manage.py runserver ip:port
Admin:
- publish content
- add/modify/delete content
- add “‘django.contrib.admin’” in INSTALLED_APPS
- exists by defualt
- add “/admin” and log in
- change language or timezone based on preference

(to be continued…)

ZEXI JIN