python
Random Pandas Notes, Note 2
Reference: https://www.mikulskibartosz.name/how-to-split-a-list-inside-a-dataframe-cell-into-rows-in-pandas/
Given a table containing a column of lists, like:
id | name | tags |
---|---|---|
1 | winter squash | [‘60-minutes’, ‘time-to-make’, ‘healthy’] |
2 | braised pork | [‘2-hours-more’, ‘time-to-make’, ‘chinese’] |
3 | chilli Beef | [‘4-hours-more’, ‘chinese’] |
Like the ‘unwind’ function in mongodb, to turn it into:
id | name | tags |
---|---|---|
1 | winter squash | ‘60-minutes’ |
1 | winter squash | ‘time-to-make’ |
1 | winter squash | ‘healthy’ |
2 | braised pork | ‘2-hours-more’ |
2 | braised pork | ‘time-to-make’ |
2 | braised pork | ‘chinese’ |
3 | chilli Beef | ‘4-hours-more’ |
3 | chilli Beef | ‘chinese’ |
Here’s who we do it:
1 | tags_df['tags'] = tags_df.tags.apply(lambda x: x[1:-1].split(',')) |
Breaking it down, we have the following to turn the tags from a string into a list of strings.
1 | tags_df['tags'] = tags_df.tags.apply(lambda x: x[1:-1].split(',')) |
Next step, turning a list of tags into multiple columns:
1 | tags_df.tags.apply(pd.Series) |
It turns the chart like:
1 | 2 | 3 |
---|---|---|
‘60-minutes’ | ‘time-to-make’ | ‘healthy’ |
‘2-hours-more’ | ‘time-to-make’ | ‘chinese’ |
‘4-hours-more’ | ‘chinese’ | NaN |
Then, we join tags with the rest of the list:
1 | prev_df.merge(tags_df, right_index = True, left_index = True) |
Then, we drop the duplicated “tags” column and unwind different columns of the tags into different rows:
1 | prev_df.drop(["tags"], axis = 1).melt(id_vars = ['recipe_id'], value_name = "tags") |
Lastly, we remove the “variable” column, which we might not need:
1 | prev_df.drop("variable", axis = 1).dropna() |
Random Pandas Notes, Note 1
To change a row value based on more than one conditions, use the following:
1 | df['A'] = np.where(((df['B'] == 'some_value') & (df['C'] == 'some_other_value')), true_value, false_value) |
Another equally efficient approach is using loc:
Note that this method does not give a default value when the condition is not met. So, the same code might be required to run twice or using .fillna() method.
1 | df.loc[(df.B == 'some_value') | (df.C == 'some_other_value'), 'A'] = true_value |
ps: for ‘and’ or ‘or’ operation, use symbol ‘&’ or ‘|’.
A Brief Summary of Sorting Algorithm, Part 1
- Basics:
- in-place sorting vs. out-place sorting
- internal sorting vs. external sorting
- stable vs. unstable sorting
- Bubble Sort
- Selection Sort
- Insertion Sort
- Merge Sort
- Quick Sort
- Heap Sort
Internal vs. external sorting
Internal sorting and external sorting describes where the sorting occurs:
- internal sorting located entirely in memory
- external sorting utilizes hard disk and external storage
Stable vs. unstable sorting
A sorting algorithm is said to be stable if two objects with equal keys appear in the same order in sorted output as they appear in the input array to be sorted.
- stable sorting algorithms includes:
- Bubble Sort
- Insertion Sort
- Merge Sort
- Count Sort
Bubble Sort
Bubble Sort is a type of stable sorting algorithm. The algorithm compares two elements that are next to each other and swap two element is the left one is larger than the right one. Time complexity of bubble sort is O(n*n).
1 | def bubbleSort(arr): |
Selection Sort
In every iteration of selection sort, the minimum element (considering ascending order) from the unsorted subarray is picked and moved to the sorted sub-array. Selection sort can be done stably. Time complexity of selection sort is O(n*n).
1 | def selectionSort(arr): |
Insertion Sort
Insertion sort is stable. In every iteration of insertion sort, the first element is selected and inserted into the correct location in the sorted half of the array. Time complexity of Insertion sort is O(n*n).
1 | def insertionSort(arr): |
Merge Sort
Merge sort is stable. Merge Sort is a Divide and Conquer algorithm. It divides input array in two halves, calls itself for the two halves and then merges the two sorted halves. Time complexity is O(n*log(n)).
1 | def mergeSort(arr): |
Quick Sort
For quick sort, we pick a random element as pivot. Compare each element with the pivot to create first half of the list smaller than the pivot and the second half larger than the pivot. After that, quick sort divide conquer two halves. Time complexity for quick sort is O(nlog(n)), worst case is O(nn). Quick sort can be made stable.
1 | def partition(arr,low,high): |
Heap Sort
- Build a max heap from the input data.
- At this point, the largest item is stored at the root of the heap. Replace it with the last item of the heap followed by reducing the size of heap by 1. Finally, heapify the root of tree.
- Repeat above steps while size of heap is greater than 1.
Heapify: this procedure calls itself recursively to move the max value to the top of the heap. Time complexity for heapify is O(log(n)) and time complexity for building a heap is O(n). Thus, heap sort gives the overall time complexity as O(n*log(n)).1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24def heapify(arr, n, i):
largest = i # Initialize largest as root
l = 2 * i + 1 # left = 2*i + 1
r = 2 * i + 2 # right = 2*i + 2
# left child
if l < n and arr[i] < arr[l]:
largest = l
# right child
if r < n and arr[largest] < arr[r]:
largest = r
# change the root
if largest != i:
arr[i],arr[largest] = arr[largest],arr[i] # swap
# Heapify upward
heapify(arr, n, largest)
def heapSort(arr):
n = len(arr)
for i in range(n, -1, -1):
heapify(arr, n, i)
for i in range(n-1, 0, -1):
arr[i], arr[0] = arr[0], arr[i] # swap
heapify(arr, i, 0)
return a
Django Basics, Part 1
Django directory levels
- manage.py: we use this to communicate with Django server
- project directory:
- __init__.py: tells Django to use directory as a python package
- settings.py: Django configuration file
- urls.py: url schema
- wsgi.py: project and wsgi integration web server
Database (MySQL) config
- dependencies used:
mysql 5.7.24 h56848d4_0
mysql-connector-c 6.1.11 hccea1a4_0
mysql-connector-python 8.0.17 py27h3febbb0_0 anaconda
mysql-python 1.2.5 py27h1de35cc_0 anaconda
pymysql 0.9.3 py27_0 - __init__.py:, add the following code:
1
2import pymysql
pymysql.install_as_MySQLdb() - in settings.py: add database configurations to connect to the database
1
2
3
4
5
6
7
8
9
10DATABASES = {
'default': {
'ENGINE': 'django.db.backends.mysql',
'NAME': # server name
'USER': # user name
'PASSWORD': # password
'HOST': 'localhost',
'PORT': '3306',
}
}
Init new application
- init new application:
1
python manage.py startapp <myApp>
- codes above creates:
- admin.py: website config
- models.py: model
- views.py: view
- In setting, add to INSTALLED_APP:
1
2
3
4
5
6
7
8
9INSTALLED_APPS = [
'django.contrib.admin',
'django.contrib.auth',
'django.contrib.contenttypes',
'django.contrib.sessions',
'django.contrib.messages',
'django.contrib.staticfiles',
'myApp'
]
models.py file
- the following is an example of models.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26# -*- coding: utf-8 -*-
from __future__ import unicode_literals
from django.db import models
# Create your models here.
class Grades(models.Model):
gname = models.CharField(max_length=20)
gdate = models.DateTimeField()
ggirlnum = models.IntegerField()
gboynum = models.IntegerField()
isDelete = models.BooleanField(default=False)
def __str__(self):
return "%s-%d-%d"%(self.gname, self.gboynum, self.ggrilnum)
class Students(models.Model):
sname = models.CharField(max_length=20)
sgender = models.BooleanField(default=True)
sage = models.IntegerField()
scontend = models.CharField(max_length=20)
isDelete = models.BooleanField(default=False)
sgrade = models.ForeignKey("Grades")
def __str__(self):
return "%s-%d-%s"%(self.sname, self.sage, self.scontend) - make migration and execute migration file: will create database according to models.py
1
2python manage.py makemigrations # create migration file
python manage.py migrate # execute migration file
To add new entries with manage.py shell commands
- enter shell:
1
python manage.py shell
- import packages in shell:
1
2
3from myApp.models import Grades, Students
from django.utils import timezone
from datetime import * - add new entry to db:
1
2
3
4
5
6grade1 = Grades()
grade1.gname = "something"
grade1.gdate = datetime(year=,month=,day=)
grade1.ggirlnum = 70
grade1.gboynum = 35
grade1.save() - some other command:
1
2
3
4
5
6Grades.objects.all() # get all data entries in Grades
g = Grades.objects.get(pk=1) # get the 1st object in Grades
g.gboynum = 45
g.save() # alternate existing data
d.delete() # delete, including the one in db
stu1 = g.students_set.create(sname=...) # add new student with one line
To run server
- execute:
1
python manage.py runserver ip:port
- Admin:
- publish content
- add/modify/delete content
- add “‘django.contrib.admin’” in INSTALLED_APPS
- exists by defualt
- add “/admin” and log in
- change language or timezone based on preference
(to be continued…)