Random Pandas Notes, Note 2
Reference: https://www.mikulskibartosz.name/how-to-split-a-list-inside-a-dataframe-cell-into-rows-in-pandas/
Given a table containing a column of lists, like:
id | name | tags |
---|---|---|
1 | winter squash | [‘60-minutes’, ‘time-to-make’, ‘healthy’] |
2 | braised pork | [‘2-hours-more’, ‘time-to-make’, ‘chinese’] |
3 | chilli Beef | [‘4-hours-more’, ‘chinese’] |
Like the ‘unwind’ function in mongodb, to turn it into:
id | name | tags |
---|---|---|
1 | winter squash | ‘60-minutes’ |
1 | winter squash | ‘time-to-make’ |
1 | winter squash | ‘healthy’ |
2 | braised pork | ‘2-hours-more’ |
2 | braised pork | ‘time-to-make’ |
2 | braised pork | ‘chinese’ |
3 | chilli Beef | ‘4-hours-more’ |
3 | chilli Beef | ‘chinese’ |
Here’s who we do it:
1 | tags_df['tags'] = tags_df.tags.apply(lambda x: x[1:-1].split(',')) |
Breaking it down, we have the following to turn the tags from a string into a list of strings.
1 | tags_df['tags'] = tags_df.tags.apply(lambda x: x[1:-1].split(',')) |
Next step, turning a list of tags into multiple columns:
1 | tags_df.tags.apply(pd.Series) |
It turns the chart like:
1 | 2 | 3 |
---|---|---|
‘60-minutes’ | ‘time-to-make’ | ‘healthy’ |
‘2-hours-more’ | ‘time-to-make’ | ‘chinese’ |
‘4-hours-more’ | ‘chinese’ | NaN |
Then, we join tags with the rest of the list:
1 | prev_df.merge(tags_df, right_index = True, left_index = True) |
Then, we drop the duplicated “tags” column and unwind different columns of the tags into different rows:
1 | prev_df.drop(["tags"], axis = 1).melt(id_vars = ['recipe_id'], value_name = "tags") |
Lastly, we remove the “variable” column, which we might not need:
1 | prev_df.drop("variable", axis = 1).dropna() |
Binary Tree Traversal, Part 1
outline:
- pre-order, post-order, in-order traversal of binary tree
- level-order traversal of binary tree
pre-order, post-order, in-order traversal of binary tree
pre-order:
1 | class Solution { |
post-order:
1 | class Solution { |
in-order:
1 | class Solution { |
level-order
recursive method:
1 | class Solution { |
iterative method:
1 | class Solution { |
Backtracking Leetcode Summary, Part 2
- Subsets
- Palindrome
- Combination Sum, repeated number used
- Combination Sum, unique number used
Subsets:
Problem Description: Given a set of distinct integers, nums, return all possible subsets (the power set). Note: The solution set must not contain duplicate subsets.
Few Key Points:
- This algorithm is very close to that of permutation. The only difference is that this counts for subsets instead of permutations.
- One of the key point is to keep the order, don’t permute like the permutation algorithm
Code:
backtracking part, takes parameters:
- result: LinkedList>
- current: ArrayList
- nums: nums to extract from
- cur_idx: current index
Note: when adding to result, convert the current list into a new listThe solution part:1
2
3
4
5
6
7
8
9
10
11public void backtrack(List<List<Integer>> l,
List<Integer> tmpList,
int [] nums,
int start) {
l.add(new ArrayList<>(tmpList));
for (int i = start; i < nums.length; i++) {
tmpList.add(nums[i]);
backtrack(l, tmpList, nums, i+1);
tmpList.remove(tmpList.size() - 1);
}
}1
2
3
4
5
6
7public List<List<Integer>> subsets(int[] nums) {
List<List<Integer>> res = new LinkedList<>();
List<Integer> tmp = new ArrayList<Integer>();
int start = 0;
backtrack(res, tmp, nums, start);
return res;
}
Palindrome:
Problem Description: Given a string s, partition s such that every substring of the partition is a palindrome. Return all possible palindrome partitioning of s.
e.g.:
Input: “aab”
Output:
[[“aa”,”b”],
[“a”,”a”,”b”]]
Few Key Points:
An utility function can be used to determine if str is palindrome or not.
Code:
backtrack:
1 | public void backtrack(List<List<String>> result, |
Check if str is palindrome:
1 | public boolean isPalindrome(String str, int s, int e) { |
The solution part:
1 | public List<List<String>> partition(String s) { |
Combination Sum, repeat allowed
Problem Description: Given a set of candidate numbers (candidates) (without duplicates) and a target number (target), find all unique combinations in candidates where the candidate numbers sums to target. The same repeated number may be chosen from candidates unlimited number of times.
e.g.:
Input: candidates = [2,3,6,7], target = 7,
A solution set is:
[[7],
[2,2,3]]
Codes:
backtrack:
1 | public void backtrack(List<List<Integer>> res, |
The solution part:
1 | public List<List<Integer>> combinationSum(int[] candidates, int target) { |
Combination Sum, no repeats
Problem Description: Same as combination sum, with only difference that each number can only be used once.
Codes:
backtrack: (added while loop to eliminate repeating results)
1 | public void backtrack(List<List<Integer>> res, |
The solution part is the same as the last section.
Random Pandas Notes, Note 1
To change a row value based on more than one conditions, use the following:
1 | df['A'] = np.where(((df['B'] == 'some_value') & (df['C'] == 'some_other_value')), true_value, false_value) |
Another equally efficient approach is using loc:
Note that this method does not give a default value when the condition is not met. So, the same code might be required to run twice or using .fillna() method.
1 | df.loc[(df.B == 'some_value') | (df.C == 'some_other_value'), 'A'] = true_value |
ps: for ‘and’ or ‘or’ operation, use symbol ‘&’ or ‘|’.
Backtracking Leetcode Summary, Part 1
- N-Queens
- Permutation
N-Queens:
Problem Description: The n-queens puzzle is the problem of placing n queens on an n×n chessboard such that no two queens attack each other. Given an integer n, return all distinct solutions to the n-queens puzzle. Each solution contains a distinct board configuration of the n-queens’ placement, where ‘Q’ and ‘.’ both indicate a queen and an empty space respectively.
Thought Process in Short: Backtracking with DFS. We know each row has one queen, so we move from top line to the bottom line line by line, checking each column, and two diagonals.
Few Key Points:
- For all “hill” diagonals row + column = const, and for all “dale” diagonals row - column = const.
- There could be the only one queen in a row and the only one queen in a column.
Algorithms:
- Start from the first row = 0.
- Iterate over the columns and try to put a queen in each column.
- If square (row, column) is not under attack
- Place the queen in (row, column) square.
- Exclude one row, one column and two diagonals from further consideration.
- If all rows are filled up row == N
- That means that we find out one more solution.
Else - Proceed to place further queens backtrack(row + 1).
- That means that we find out one more solution.
- Now backtrack : remove the queen from (row, column) square.
- If square (row, column) is not under attack
Codes:
Initialize log variables:
1 | int rows[]; |
Checking if a cell is under attack:
1 | public boolean isNotUnderAttack(int row, int col) { |
Method placing and removing queens:
1 | public void placeQueen(int row, int col) { |
Adding a solution to existing solutions:
1 | public void addSolution() { |
The actual backtracking process:
1 | public void backtrack (int row) { |
Main function of solution:
1 | public List<List<String>> solveNQueens(int n) { |
Permutation:
One of the more basic backtracking problem.
Algorithms:
- If the first integer to consider has index n that means that the current permutation is done.
- Iterate over the integers from index first to index n - 1.
- Place i-th integer first in the permutation, i.e. swap(nums[first], nums[i]).
- Proceed to create all permutations which starts from i-th integer : backtrack(first + 1).
- Now backtrack, i.e. swap(nums[first], nums[i]) back.
Code:
BFS to find solutions, then backtracking to reverse to previous states. The following is the backtracking part.
1 | public void backtrack(int n, |
Main function of solution:
1 | public List<List<Integer>> permute(int[] nums) { |
A Brief Summary of Sorting Algorithm, Part 1
- Basics:
- in-place sorting vs. out-place sorting
- internal sorting vs. external sorting
- stable vs. unstable sorting
- Bubble Sort
- Selection Sort
- Insertion Sort
- Merge Sort
- Quick Sort
- Heap Sort
Internal vs. external sorting
Internal sorting and external sorting describes where the sorting occurs:
- internal sorting located entirely in memory
- external sorting utilizes hard disk and external storage
Stable vs. unstable sorting
A sorting algorithm is said to be stable if two objects with equal keys appear in the same order in sorted output as they appear in the input array to be sorted.
- stable sorting algorithms includes:
- Bubble Sort
- Insertion Sort
- Merge Sort
- Count Sort
Bubble Sort
Bubble Sort is a type of stable sorting algorithm. The algorithm compares two elements that are next to each other and swap two element is the left one is larger than the right one. Time complexity of bubble sort is O(n*n).
1 | def bubbleSort(arr): |
Selection Sort
In every iteration of selection sort, the minimum element (considering ascending order) from the unsorted subarray is picked and moved to the sorted sub-array. Selection sort can be done stably. Time complexity of selection sort is O(n*n).
1 | def selectionSort(arr): |
Insertion Sort
Insertion sort is stable. In every iteration of insertion sort, the first element is selected and inserted into the correct location in the sorted half of the array. Time complexity of Insertion sort is O(n*n).
1 | def insertionSort(arr): |
Merge Sort
Merge sort is stable. Merge Sort is a Divide and Conquer algorithm. It divides input array in two halves, calls itself for the two halves and then merges the two sorted halves. Time complexity is O(n*log(n)).
1 | def mergeSort(arr): |
Quick Sort
For quick sort, we pick a random element as pivot. Compare each element with the pivot to create first half of the list smaller than the pivot and the second half larger than the pivot. After that, quick sort divide conquer two halves. Time complexity for quick sort is O(nlog(n)), worst case is O(nn). Quick sort can be made stable.
1 | def partition(arr,low,high): |
Heap Sort
- Build a max heap from the input data.
- At this point, the largest item is stored at the root of the heap. Replace it with the last item of the heap followed by reducing the size of heap by 1. Finally, heapify the root of tree.
- Repeat above steps while size of heap is greater than 1.
Heapify: this procedure calls itself recursively to move the max value to the top of the heap. Time complexity for heapify is O(log(n)) and time complexity for building a heap is O(n). Thus, heap sort gives the overall time complexity as O(n*log(n)).1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24def heapify(arr, n, i):
largest = i # Initialize largest as root
l = 2 * i + 1 # left = 2*i + 1
r = 2 * i + 2 # right = 2*i + 2
# left child
if l < n and arr[i] < arr[l]:
largest = l
# right child
if r < n and arr[largest] < arr[r]:
largest = r
# change the root
if largest != i:
arr[i],arr[largest] = arr[largest],arr[i] # swap
# Heapify upward
heapify(arr, n, largest)
def heapSort(arr):
n = len(arr)
for i in range(n, -1, -1):
heapify(arr, n, i)
for i in range(n-1, 0, -1):
arr[i], arr[0] = arr[0], arr[i] # swap
heapify(arr, i, 0)
return a
Multiple Thread (Runnable vs. Thread), Part 1
- Multi-threading states
- Java thread init methods
- Examples of Thread and Runnable
Multi-threading states
- new: 新建状态,保持这个状态直到程序start()
- ready: 调用了start(),就绪状态的线程处于就绪队列中,要等待JVM里线程调度器的调度
- running: 获取了CPU资源,执行run()里面的指令
- suspend: 失去所占用资源之后,该线程就从运行状态进入阻塞状态。在睡眠时间已到或获得设备资源后可以重新进入就绪状态。可以分为三种
- 等待阻塞: wait()
- 同步阻塞: 线程在获取synchronized同步锁失败后
- 其他阻塞: 其他阻塞:通过调用线程的sleep()或join()发出了I/O请求时,线程就会进入到阻塞状态
- dead: 一个运行状态的线程完成任务或者其他终止条件发生时,该线程就切换到终止状态
Java thread init methods
Java 提供了三种创建线程的方法
- 通过实现Runnable接口
- 通过继承Thread类本身
- 通过Callable和Future创建线程
Java thread example
1 | // class extending Thread |
Runnable Example
1 | import java.util.concurrent.locks.Lock; |
Django Basics, Part 1
Django directory levels
- manage.py: we use this to communicate with Django server
- project directory:
- __init__.py: tells Django to use directory as a python package
- settings.py: Django configuration file
- urls.py: url schema
- wsgi.py: project and wsgi integration web server
Database (MySQL) config
- dependencies used:
mysql 5.7.24 h56848d4_0
mysql-connector-c 6.1.11 hccea1a4_0
mysql-connector-python 8.0.17 py27h3febbb0_0 anaconda
mysql-python 1.2.5 py27h1de35cc_0 anaconda
pymysql 0.9.3 py27_0 - __init__.py:, add the following code:
1
2import pymysql
pymysql.install_as_MySQLdb() - in settings.py: add database configurations to connect to the database
1
2
3
4
5
6
7
8
9
10DATABASES = {
'default': {
'ENGINE': 'django.db.backends.mysql',
'NAME': # server name
'USER': # user name
'PASSWORD': # password
'HOST': 'localhost',
'PORT': '3306',
}
}
Init new application
- init new application:
1
python manage.py startapp <myApp>
- codes above creates:
- admin.py: website config
- models.py: model
- views.py: view
- In setting, add to INSTALLED_APP:
1
2
3
4
5
6
7
8
9INSTALLED_APPS = [
'django.contrib.admin',
'django.contrib.auth',
'django.contrib.contenttypes',
'django.contrib.sessions',
'django.contrib.messages',
'django.contrib.staticfiles',
'myApp'
]
models.py file
- the following is an example of models.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26# -*- coding: utf-8 -*-
from __future__ import unicode_literals
from django.db import models
# Create your models here.
class Grades(models.Model):
gname = models.CharField(max_length=20)
gdate = models.DateTimeField()
ggirlnum = models.IntegerField()
gboynum = models.IntegerField()
isDelete = models.BooleanField(default=False)
def __str__(self):
return "%s-%d-%d"%(self.gname, self.gboynum, self.ggrilnum)
class Students(models.Model):
sname = models.CharField(max_length=20)
sgender = models.BooleanField(default=True)
sage = models.IntegerField()
scontend = models.CharField(max_length=20)
isDelete = models.BooleanField(default=False)
sgrade = models.ForeignKey("Grades")
def __str__(self):
return "%s-%d-%s"%(self.sname, self.sage, self.scontend) - make migration and execute migration file: will create database according to models.py
1
2python manage.py makemigrations # create migration file
python manage.py migrate # execute migration file
To add new entries with manage.py shell commands
- enter shell:
1
python manage.py shell
- import packages in shell:
1
2
3from myApp.models import Grades, Students
from django.utils import timezone
from datetime import * - add new entry to db:
1
2
3
4
5
6grade1 = Grades()
grade1.gname = "something"
grade1.gdate = datetime(year=,month=,day=)
grade1.ggirlnum = 70
grade1.gboynum = 35
grade1.save() - some other command:
1
2
3
4
5
6Grades.objects.all() # get all data entries in Grades
g = Grades.objects.get(pk=1) # get the 1st object in Grades
g.gboynum = 45
g.save() # alternate existing data
d.delete() # delete, including the one in db
stu1 = g.students_set.create(sname=...) # add new student with one line
To run server
- execute:
1
python manage.py runserver ip:port
- Admin:
- publish content
- add/modify/delete content
- add “‘django.contrib.admin’” in INSTALLED_APPS
- exists by defualt
- add “/admin” and log in
- change language or timezone based on preference
(to be continued…)