Thursday, March 7, 2019

SQL Joins with On or Using


In a nutshell, you use ON for most things, but USING is a handy shorthand for the situation where the column names are the same.

Consider this example dataset:

mysql> select * from pets;
+---------+---------+--------+-----------+
| pets_id | animal  | name   | owners_id |
+---------+---------+--------+-----------+
|       1 | fox     | Rusty  |         2 |
|       2 | cat     | Fluffy |         2 |
|       3 | cat     | Smudge |         3 |
|       4 | cat     | Toffee |         3 |
|       5 | dog     | Pig    |         3 |
|       6 | hamster | Henry  |         1 |
|       7 | dog     | Honey  |         1 |
+---------+---------+--------+-----------+
7 rows in set (0.00 sec)

mysql> select * from owners;
+-----------+-------+
| owners_id | name  |
+-----------+-------+
|         1 | Susie |
|         2 | Sally |
|         3 | Sarah |
+-----------+-------+
3 rows in set (0.00 sec)

To find out who has which pets, we would join the two tables together like this:
mysql> select owners.name as owner, pets.name as pet, pets.animal
    -> from owners join pets on (pets.owners_id = owners.owners_id);
+-------+--------+---------+
| owner | pet    | animal  |
+-------+--------+---------+
| Sally | Rusty  | fox     |
| Sally | Fluffy | cat     |
| Sarah | Smudge | cat     |
| Sarah | Toffee | cat     |
| Sarah | Pig    | dog     |
| Susie | Henry  | hamster |
| Susie | Honey  | dog     |
+-------+--------+---------+
7 rows in set (0.00 sec)

The example above uses the ON keyword, but since the columns we use to join are called owners_id in both tables, then we can instead put in USING as a shorthand.

mysql> select owners.name as owner, pets.name as pet, pets.animal
    -> from owners join pets using (owners_id);

Here are the different types of the JOINs in SQL:
  • (INNER) JOIN: Returns records that have matching values in both tables
  • LEFT (OUTER) JOIN: Return all records from the left table, and the matched records from the right table
  • RIGHT (OUTER) JOIN: Return all records from the right table, and the matched records from the left table
  • FULL (OUTER) JOIN: Return all records when there is a match in either left or right table

Tuesday, March 5, 2019

practice libsvm

practice the guide.pdf saved in C:\Temp\learn_libsvm\test1_research_the_GUIDE

cd /media/sf_Temp/learn_libsvm/test3_test_libsvm/libsvm-3.23 && make

===========================================================================
this test is for demo purpose(use easy.py in one step):

use the data of project Astroparticle under https://www.csie.ntu.edu.tw/~cjlin/papers/guide/data/ ,
it has training dat as train.1 and test dat as test.1 :
save train.1 into svmguide1
save test.1 into  svmguide1.t

//feature|attr value range [-1, 1], 
./svm-scale -l -1 -u 1 -s range1 svmguide1 > svmguide1.scale
/*
1 1:2.617300e+01 2:5.886700e+01 3:-1.894697e-01 4:1.251225e+02
...
into
1 1:-0.823781 2:-0.783405 3:-0.233795 4:0.361305 
...
0 1:-0.885045 2:-0.949015 3:0.182549 4:-0.519316 

has 2 classes,  class label Infected or Clean
*/

./svm-scale -r range1 svmguide1.t > svmguide1.t.scale

./svm-train svmguide1.scale
/*
*
optimization finished, #iter = 496
nu = 0.202599
obj = -507.307046, rho = 2.627039
nSV = 630, nBSV = 621
Total nSV = 630
*/

./svm-predict svmguide1.t.scale svmguide1.scale.model svmguide1.t.predict
Accuracy = 96.15% (3846/4000) (classification)


+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

to achieve better Accuracy, ./svm-train requires cross-validation and specify following args:
-c -g 

the RBF kernel function argument C and galma

//the grid tools is a time consumer
./tools/grid.py svmguide1.scale
2.0 2.0 96.9893

./svm-train -c 2 -g 2 svmguide1.scale
Accuracy = 96.875%
===========================================================================
use easy.py in one step:
apt-get install -y gnuplot
cd tools/ && ./easy.py ../svmguide1 ../svmguide1.t

Scaling training data...
Cross validation...
Best c=2.0, g=2.0 CV rate=96.9893
Training...
Output model: svmguide1.model
Scaling testing data...
Testing...
Accuracy = 96.875% (3875/4000) (classification)
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++