免責聲明

Disclaimer (免責聲明)
繼續閱覽代表您接受以上的免責聲明.
To continue reading means you accept the above disclaimer.

2015年6月28日 星期日

mysql connections parallel insert



//=== node-mysql
https://github.com/felixge/node-mysql/
*** need multiple connections to execute queries in parallel.


//=== python, mysql
http://themattreid.com/wordpress/2010/08/30/easy-python-threading-mysql-connections/

"""...
You will run into performance problems with the Python threading module because of the GIL.
... Eventlet for non-blocking asynchronous calls (eventlet.GreenPool and similar) or
using the multiprocessing Python 2.6+ module.


..."""




"...
... single threaded database operations are simply too slow. In these cases it’s a matter of course that you’ll use multi-threading or forking to spawn secondary processes to handle the database actions.

...



import threading
import sys
import MySQLdb
import pdb

class threader(threading.Thread):
    def __init__(self,db):
        threading.Thread.__init__(self)
        self.db = db
    def run(self):
        run_insert(self.db)

def run_insert(db):
    sql = "INSERT INTO `test` (`id`,`col_a`,`col_b`,`col_c`) VALUES (NULL,'0','0','0');"
    print "thread executing sql:%s"%(sql)
    try:
        cursor = db.cursor()
        cursor.execute(sql)
        cursor.close()
        db.commit()
    except:
        print "insert failed"

    return

def init_thread(): 
    backgrounds = []
    for db in connections:
        print "connection: %s"%(db)
        background = threader(db)
        background.start()
        backgrounds.append(background)

    for background in backgrounds:
        background.join()


def main():
    try:
        init_thread()
    except:
        print "failed to initiate threads"
    sys.exit(0)




if __name__ == "__main__":
    ....
    threads = int(4) #quantity of execution threads and size of connection pool

    connections = []
    for thread in range(threads):
        try:
            connections.append(MySQLdb.connect(host=mysql_host, user=mysql_user, passwd=mysql_pass, db=mysql_db, port=mysql_port))
            #pdb.set_trace()

        except MySQLdb.Error, e:
            print "Error %d: %s"%(e.args[0], e.args[1])
            pdb.set_trace()            
            sys.exit (1)


    main()



..."""




//=== http://stackoverflow.com/questions/12919047/mysql-connector-pooling-with-many-async-inserts
C#, mysql connector, pool

"""...
It is suggested to not use a global MySqlConnection object ...
Further, it's suggested to use the MySqlHelper instead of working with MySqlCommand objects.

...

MySqlHelper.ExecuteNonQuery(CONNECTION_STRING,
@"INSERT INTO `table1` (`col3`,`col4`) VALUES (@col3, @col4 );",
(new List() {
new MySqlParameter("@col3", @"another value"),
new MySqlParameter("@col4", @"some value")
}).ToArray()
);
Now, imagine that I make the above call a few thousand times in async parallel tasks


added this in my connection string:
Pooling=True;minimumPoolSize=0;maximumpoolsize=100;

...
Yet, I still get max connections exceeded

need multiple connections to execute queries in parallel.
...
[on server side] In addition to max_connections, the max_user_connections is what is important here. It can be queried for:
show variables like 'max_user_connections';

...
To avoid exceeding max connections for your MySQL user you'll need to make sure the maximumpoolsize (in the connection string) is set to something lower than max_user_connections



..."""



*** python threading or multiprocessing ?
//=== http://stackoverflow.com/questions/3044580/multiprocessing-vs-threading-python

GIL: Global Interpreter Lock

"""...

... multiprocessing gets around the Global Interpreter Lock, but what other advantages are there

...
 The threading module uses threads, the multiprocessing uses processes. 
The difference is that threads run in the same memory space, 
while processes have separate memory. 
This makes it a bit harder to share objects between processes with multiprocessing. 
Since threads use the same memory, precautions have to be taken 
or two threads will write to the same memory at the same time. 
This is what the global interpreter lock is for.

Spawning processes is a bit slower than spawning threads. 
Once they are running, there is not much difference.


...
Threading's job is to enable applications to be responsive. Suppose you have a database connection and you need to respond to user input.
... By splitting off the database connection into a separate thread you can make the application more responsive. Also because both threads are in the same process, they can access the same data structures - good performance, plus a flexible software design.

Note that due to the GIL the app isn't actually doing two things at once, but what we've done is 
put the resource lock on the database into a separate thread so that 
CPU time can be switched between it and the user interaction. 
...


Multiprocessing is for times when you really do want more than one thing to be done at any given time. 
Suppose your application needs to connect to 6 databases and perform a complex matrix transformation on each dataset. 
Putting each job in a separate thread might help a little because when one connection is idle another one could get some CPU time, but the processing would not be done in parallel ... 

By putting each job in a Multiprocessing process, each can run on it's own CPU and run at full efficiency.




..."""


2015年6月24日 星期三

mysql consecutive inserts on one connection lead to race?


http://stackoverflow.com/questions/19283825/mysql-transactions-not-stopping-race-conditions-from-for-loop

select ... for update to prevent race condition ?

"""...
var testTransaction = function (count) {
connection.beginTransaction(function(err) {
if (err) throw err;
db.query('SELECT * FROM myTable WHERE id = 1 FOR UPDATE', function(err, rows, result) {
if (err) {
connection.rollback(function() {
throw err;
});
}
connection.query('UPDATE myTable SET myField=? WHERE id=1', (count + 1), function(err, result) {
if (err) {
db.rollback(function() {

...
}

for (var i = 0; i < 2; i++) { testTransaction(i); } ...""" 



//===================================
--> uses a different connection for each transaction.

function conn() {
var connection = mysql.createConnection({
host : 'localhost',
user : 'user',
password : 'pass',
database : 'test'
});
return connection;
}

for (var i = 0; i < 2; i++) { testTransaction(conn(),i); } 



--> or promise pattern
transaction1.then(transaction2);


mysql multiple-row insert, single-row insert, or transaction insert


http://stackoverflow.com/questions/1793169/which-is-faster-multiple-single-inserts-or-one-multiple-row-insert

"""...
Which is faster: multiple single INSERTs or one multiple-row INSERT?

...
use INSERT statements with multiple VALUES lists to insert several rows at a time. This is considerably faster (many times faster in some cases) than using separate single-row INSERT statements.

...
in general, multiple inserts will be slower because of the connection overhead. Doing multiple inserts at once will reduce the cost of overhead per insert.


..."""


INSERT INTO tablename (col1, col2) VALUES (1, 2), (3, 4), (5, 6), (7, 8)
Which will insert 4 rows.



//=== https://dev.mysql.com/doc/refman/5.0/en/insert-speed.html
"""...
The time required for inserting a row is determined by the following factors, where the numbers indicate approximate proportions:

Connecting: (3)

Sending query to server: (2)

Parsing query: (2)

Inserting row: (1 × size of row)

Inserting indexes: (1 × number of indexes)

Closing: (1)
..."""



//=== https://github.com/felixge/node-mysql/
"""...
connection.beginTransaction(function(err) {
if (err) { throw err; }
connection.query('INSERT INTO posts SET title=?', title, function(err, result) {
if (err) {
connection.rollback(function() {
throw err;
});
}

var log = 'Post ' + result.insertId + ' added';

connection.query('INSERT INTO log SET data=?', log, function(err, result) {
if (err) {
connection.rollback(function() {
throw err;
});
}
connection.commit(function(err) {
if (err) {
connection.rollback(function() {
throw err;
});


.."""


grep to find word in files


$ man grep

-r search recursively if the given path is a directory
-i case-insensitive search
-n print the line number

-H print the filename [implied when multiple files are searched, i.e. no need to add -H
when the given path stands for mutliple files]

-I ignore binary files (complement: -a treat all files as text)
-F treat search term as a literal, not a regular expression


--exclude=GLOB
Skip files whose base name matches GLOB (using wildcard
matching). A file-name glob can use *, ?, and [...] as
wildcards, and \ to quote a wildcard or backslash character
literally.

--exclude-from=FILE
Skip files whose base name matches any of the file-name globs
read from FILE (using wildcard matching as described under
--exclude).

--exclude-dir=DIR
Exclude directories matching the pattern DIR from recursive searches.
[ e.g, to exlcude .svn and .git
xxx $ grep -r "non-sense" --exclude-dir='\.svn|\.git' (regex not supported for --exclude-dir ??)
--> $ grep -r "non-sense" --exclude-dir=.svn --exclude-dir=.git

http://superuser.com/questions/66715/recursive-grep-exclude-specific-directories

http://stackoverflow.com/questions/19962645/bash-function-grep-exclude-dir-not-working
the version of grep may not be able to use --exclude-dirs.

]


-E, --extended-regexp
-e PATTERN, --regexp=PATTERN
Use PATTERN as the pattern. This can be used to specify
multiple search patterns, or to protect a pattern beginning with
a hyphen (-). (-e is specified by POSIX.)



-R, --dereference-recursive
Read all files under each directory, recursively. Follow all
symbolic links, unlike -r.




-w, --word-regexp [to find a string that is a separate word (enclosed by spaces)]
Select only those lines containing matches that form whole
words. The test is that the matching substring must either be
at the beginning of the line, or preceded by a non-word
constituent character. Similarly, it must be either at the end
of the line or followed by a non-word constituent character.
Word-constituent characters are letters, digits, and the
underscore.


-L, --files-without-match [only print filenames that are not matched]
Suppress normal output; instead print the name of each input
file from which no output would normally have been printed. The
scanning will stop on the first match.

-l, --files-with-matches [only print the matching filenames]
Suppress normal output; instead print the name of each input
file from which output would normally have been printed. The
scanning will stop on the first match. (-l is specified by
POSIX.)


[examples]
$ grep -r word_to_find .
[to search "word_to_find" recursively under the current directory ]


$ grep -n "> tmp1.txt" *.sh
[to find string "> tmp1.txt" in all files with extension name ".sh" in the current folder, not recursively]

xxx often mistake
$ grep myutil.js .
grep: .: Is a directory
xxx

-->
$ grep -r myutil.js .
$ grep myutil.js *.js
$ grep "myutil.js" *.js

$ grep *.js *.js
$ grep ".*js" *.js
[here, ".*js" stands for a regex in bash]

$ grep -l "> tmp1.txt" *.sh
[list all the names of files which contain "> tmp1.txt" ]

$ grep -r "myutil.js" . --exclude-dir=.git
$ grep -r "myutil.js" . --exclude-dir=?git
[though works as well, it means to exclude all subfolders whose name is even like agit, bgit, mgit, .git, etc. ]



[ref]
http://tldp.org/LDP/Bash-Beginners-Guide/html/sect_04_02.html

bash regex
http://tldp.org/LDP/Bash-Beginners-Guide/html/sect_04_01.html
http://tldp.org/LDP/abs/html/x17129.html

http://ubuntu.elfscript.com/2013/09/ubuntu.html

2015年6月15日 星期一

mongo, mongodb client shell, query



//=== http://docs.mongodb.org/manual/tutorial/getting-started-with-the-mongo-shell/
$ mongo
> db
> use another_db
> db.collectionName.find();

//=== http://docs.mongodb.org/v2.2/reference/method/db.collection.find/

db.products.find( { qty: { $gt: 25 } }, { item: 1, qty: 1 } )

This operation returns all the documents from the collection "products"
where "qty" is greater than 25.
The documents in the results y include the
"_id", "item", and "qty" fields

using “inclusion” projection by the parameter { item: 1, qty: 1 }

find() always returns the _id field, even when not explicitly included:

Results:
{ "_id" : 11, "item" : "pencil", "qty" : 50 }
{ "_id" : ObjectId("50634d86be4617f17bb159cd"), "item" : "bottle", "qty" : 30 }
{ "_id" : ObjectId("50634dbcbe4617f17bb159d0"), "item" : "paper", "qty" : 100}

"exlcude" projection
db.products.find( { qty: { $gt: 25 } }, { _id: 0, qty: 0 } )

Results:
{ "item" : "pencil", "type" : "no.2" }
{ "item" : "bottle", "type" : "blue" }
{ "item" : "paper" }



//=== http://docs.mongodb.org/manual/tutorial/write-scripts-for-the-mongo-shell/

*** mongo client shell
$ mongo
> db = connect("localhost:27017/db1");
> show collections;
> db.collect1.insert({k1:'v1',k2:'v2', k3:v3});

> db=connect("db2");
> show collections;

Differences Between Interactive and Scripted mongo ?


To set the db global variable, use the getDB() method or the connect() method. You can assign the database reference to a variable other than db.

You cannot use any shell helper (e.g. use , show dbs, etc.) inside the JavaScript file because they are not valid JavaScript.


hell Helpers
JavaScript Equivalents

show dbs, show databases
db.adminCommand('listDatabases')

use
db = db.getSiblingDB('')

show collections
db.getCollectionNames()

show users
db.getUsers()

show roles
db.getRoles({showBuiltinRoles: true})

show log
db.adminCommand({ 'getLog' : '' })

show logs
db.adminCommand({ 'getLog' : '*' })



it
cursor = db.collection.find()
if ( cursor.hasNext() ){
cursor.next();
}





mongodb adaptor for nodejs


//=== http://stackoverflow.com/questions/24736999/how-can-i-require-mongodb-within-casperjs

install mongodb driver for nodejs
$ npm install mongodb

* this mongodb module is just an adaptor for nodejs to access the actual mongodb server,
you have to setup the mongodb server beside the module.



//=== https://mongodb.github.io/node-mongodb-native/api-generated/mongoclient.html

MongoClient class's instance member functions:
connect(), open(), close(), db()

connect(url[, options], callback)
open(callback)
close(callback)
db(dbName)



"""...
var Db = require('mongodb').Db,
MongoClient = require('mongodb').MongoClient,
Server = require('mongodb').Server,
ReplSetServers = require('mongodb').ReplSetServers,
ObjectID = require('mongodb').ObjectID,
Binary = require('mongodb').Binary,
GridStore = require('mongodb').GridStore,
Grid = require('mongodb').Grid,
Code = require('mongodb').Code,
BSON = require('mongodb').pure().BSON,
assert = require('assert');

// Set up the connection to the local db
var mongoclient = new MongoClient(new Server("localhost", 27017), {native_parser: true});

// Open the connection to the server
mongoclient.open(function(err, mongoclient) {

// Get the first db and do an update document on it
var db = mongoclient.db("integration_tests");
db.collection('mongoclient_test').update({a:1}, {b:1}, {upsert:true}, function(err, result) {
assert.equal(null, err);
assert.equal(1, result);

// Get another db and do an update document on it
var db2 = mongoclient.db("integration_tests2");
db2.collection('mongoclient_test').update({a:1}, {b:1}, {upsert:true}, function(err, result) {
assert.equal(null, err);
assert.equal(1, result);

// Close the connection
mongoclient.close();
});
});
});

..."""


//=== static member function, MongoClient.connect(url, cb)
Connect to MongoDB using a url as documented


var MongoClient = require('mongodb').MongoClient;
var assert = require('assert')
var ObjectId = require('mongodb').ObjectID;
var url = 'mongodb://myMongo1:27017/test';


db.collection('restaurants1').insertOne();

MongoClient.connect(url, function(err, db){});







2015年6月10日 星期三

nodejs and phantomjs


//=== phantomjs-node module for nodejs
https://github.com/sgentle/phantomjs-node
"""...
This module expects the phantomjs binary to be in PATH somewhere. In other words, type this:

$ phantomjs
If that works, so will phantomjs-node.

..."""

0. verfiy that phantomjs binary already works by
$ phantomjs --version

1. then install phantomjs-node module for nodejs by
$ npm install phantom


[ex]
var phantom = require('phantom');

phantom.create(function (ph) {
ph.createPage(function (page) {
page.open("http://www.google.com", function (status) {
console.log("opened google? ", status);
page.evaluate(function () { return document.title; }, function (result) {
console.log('Page title is ' + result);
ph.exit();
});
});
});
});



"""...
... How do you communicate with a process that doesn't support shared memory, sockets, FIFOs, or standard input?

Well, there's one thing PhantomJS does support, and that's opening webpages. In fact, it's really good at opening web pages. So we communicate with PhantomJS by spinning up an instance of ExpressJS, opening Phantom in a subprocess, and pointing it at a special webpage that turns socket.io messages into alert() calls. Those alert() calls are picked up by Phantom and there you go!

The communication itself happens via James Halliday's fantastic dnode library, which fortunately works well enough when combined with browserify to run straight out of PhantomJS's pidgin Javascript environment.

..."""



//=== phantomjs module for nodejs
https://www.npmjs.com/package/phantomjs
[ An NPM wrapper for PhantomJS ]


$ npm install phantomjs

Or grab the source and

$ node ./install.js

What this installer is really doing is just grabbing a particular "blessed" version of PhantomJS

"""...
PhantomJS is not a library for NodeJS. It's a separate environment and code written for node is unlikely to be compatible. In particular PhantomJS does not expose a Common JS package loader.

This is an 'NPM wrapper' and can be used to conveniently make Phantom available.
It is not a 'NodeJS wrapper'.

... drive from within a node program by spawning phantom in a child process.

... tell NPM and the PhantomJS installer to skip validation of ssl keys with NPM's strict-ssl setting:

$ npm set strict-ssl false



..."""


//=== node-phantom module for nodejs
https://www.npmjs.com/package/node-phantom
[ This is a bridge between PhantomJs and Node.js. ]

0. The bridge assumes that the "phantomjs" binary is available in the PATH.

1. The only other dependency is socket.io

*** need to manually install socket.io and node-phantom

[ex]
var phantom=require('node-phantom');
phantom.create(function(err,ph) {
return ph.createPage(function(err,page) {


//=== phantomjs on nodejs-docker
[err] phantomjs: error while loading shared libraries: libjpeg.so.8: cannot open shared object file: No such file or directory


2015年6月1日 星期一

how to transfer a docker image from one computer to another


http://stackoverflow.com/questions/23935141/how-to-copy-docker-images-from-one-host-to-another-without-via-repository
http://it.3hd.me/2015/03/docker-exportimport-pullcommit-saveload.html


//=== save/load

0. save the docker image as a tar file on pc1

$ docker save -o savefname.tar imgname
or
$ docker save imgname > savefname.tar


1. scp savefname.tar to pc2
[to save bandwidth, compress-> send-> decompress is recommended]

2. load savefname.tar to image on pc2
$ docker load < savefname.tar
or
$ docker load -i savefname.tar


3. verify 'imgname' is restored on pc2
$ docker images -a imgname



//=== export/import
[operating on container(running or not), the resultant img will lose previous history and layers]

0. on pc1:
$ docker export container1 > c1.tar

1. copy c1.tar from pc1 to pc2

2. on pc2:
$ cat c1.tar | docker import - new_imgname:tagname




//=== http://unix.stackexchange.com/questions/70581/scp-and-compress-at-the-same-time-no-intermediate-save
[Q] does scp compress file implicitly ???

"""...
scp itself can compress the data

-C Compression enable. ...

$ scp -C source user@server:/path/to/backup


...
7z should not be used for backups on Linux. As stated on the 7z man page:

DO NOT USE the 7-zip format for backup purpose on Linux/Unix because :
- 7-zip does not store the owner/group of the file.


..."""


$ tar zcvf - MyBackups | ssh user@server "cat > /path/to/backup/foo.tgz"