3H ITek Studio: 6月 2015

2015年6月28日星期日

mysql connections parallel insert





//=== node-mysql

https://github.com/felixge/node-mysql/

*** need multiple connections to execute queries in parallel.





//=== python, mysql

http://themattreid.com/wordpress/2010/08/30/easy-python-threading-mysql-connections/



"""...

You will run into performance problems with the Python threading module because of the GIL. 

... Eventlet for non-blocking asynchronous calls (eventlet.GreenPool and similar) or 

using the multiprocessing Python 2.6+ module.





..."""









"...

... single threaded database operations are simply too slow. In these cases it’s a matter of course that you’ll use multi-threading or forking to spawn secondary processes to handle the database actions.



...

import threading
import sys
import MySQLdb
import pdb

class threader(threading.Thread):
    def __init__(self,db):
        threading.Thread.__init__(self)
        self.db = db
    def run(self):
        run_insert(self.db)

def run_insert(db):
    sql = "INSERT INTO `test` (`id`,`col_a`,`col_b`,`col_c`) VALUES (NULL,'0','0','0');"
    print "thread executing sql:%s"%(sql)
    try:
        cursor = db.cursor()
        cursor.execute(sql)
        cursor.close()
        db.commit()
    except:
        print "insert failed"

    return

def init_thread(): 
    backgrounds = []
    for db in connections:
        print "connection: %s"%(db)
        background = threader(db)
        background.start()
        backgrounds.append(background)

    for background in backgrounds:
        background.join()


def main():
    try:
        init_thread()
    except:
        print "failed to initiate threads"
    sys.exit(0)




if __name__ == "__main__":
    ....
    threads = int(4) #quantity of execution threads and size of connection pool

    connections = []
    for thread in range(threads):
        try:
            connections.append(MySQLdb.connect(host=mysql_host, user=mysql_user, passwd=mysql_pass, db=mysql_db, port=mysql_port))
            #pdb.set_trace()

        except MySQLdb.Error, e:
            print "Error %d: %s"%(e.args[0], e.args[1])
            pdb.set_trace()            
            sys.exit (1)


    main()





..."""









//=== http://stackoverflow.com/questions/12919047/mysql-connector-pooling-with-many-async-inserts

C#, mysql connector, pool



"""...

 It is suggested to not use a global MySqlConnection object ...

Further, it's suggested to use the MySqlHelper instead of working with MySqlCommand objects.



...



MySqlHelper.ExecuteNonQuery(CONNECTION_STRING,

    @"INSERT INTO `table1` (`col3`,`col4`) VALUES (@col3, @col4 );",

    (new List() {

                new MySqlParameter("@col3", @"another value"),

                new MySqlParameter("@col4", @"some value")    

            }).ToArray()

    );

Now, imagine that I make the above call a few thousand times in async parallel tasks





added this in my connection string:

Pooling=True;minimumPoolSize=0;maximumpoolsize=100;



...

Yet, I still get max connections exceeded



need multiple connections to execute queries in parallel.

...

[on server side] In addition to max_connections, the max_user_connections is what is important here. It can be queried for:

show variables like 'max_user_connections';



...

To avoid exceeding max connections for your MySQL user you'll need to make sure the maximumpoolsize (in the connection string) is set to something lower than max_user_connections







..."""

*** python threading or multiprocessing ?
//=== http://stackoverflow.com/questions/3044580/multiprocessing-vs-threading-python

GIL: Global Interpreter Lock

"""...

... multiprocessing gets around the Global Interpreter Lock, but what other advantages are there

...
The threading module uses threads, the multiprocessing uses processes.
The difference is that threads run in the same memory space,
while processes have separate memory.
This makes it a bit harder to share objects between processes with multiprocessing.
Since threads use the same memory, precautions have to be taken
or two threads will write to the same memory at the same time.
This is what the global interpreter lock is for.

Spawning processes is a bit slower than spawning threads.
Once they are running, there is not much difference.

...
Threading's job is to enable applications to be responsive. Suppose you have a database connection and you need to respond to user input.
... By splitting off the database connection into a separate thread you can make the application more responsive. Also because both threads are in the same process, they can access the same data structures - good performance, plus a flexible software design.

Note that due to the GIL the app isn't actually doing two things at once, but what we've done is
put the resource lock on the database into a separate thread so that
CPU time can be switched between it and the user interaction.
...

Multiprocessing is for times when you really do want more than one thing to be done at any given time.
Suppose your application needs to connect to 6 databases and perform a complex matrix transformation on each dataset.
Putting each job in a separate thread might help a little because when one connection is idle another one could get some CPU time, but the processing would not be done in parallel ...

By putting each job in a Multiprocessing process, each can run on it's own CPU and run at full efficiency.

..."""

2015年6月24日星期三

mysql consecutive inserts on one connection lead to race?



http://stackoverflow.com/questions/19283825/mysql-transactions-not-stopping-race-conditions-from-for-loop



select ... for update to prevent race condition ?



"""...

var testTransaction = function (count) {

  connection.beginTransaction(function(err) {

    if (err) throw err;

    db.query('SELECT * FROM myTable WHERE id = 1 FOR UPDATE', function(err, rows, result) {

      if (err) { 

        connection.rollback(function() {

          throw err;

        });

      }

      connection.query('UPDATE myTable SET myField=? WHERE id=1', (count + 1), function(err, result) {

        if (err) { 

          db.rollback(function() {



...

}



for (var i = 0; i < 2; i++) {  testTransaction(i); }

..."""

//===================================

-->  uses a different connection for each transaction.



function conn() {

    var connection = mysql.createConnection({

        host     : 'localhost',

        user     : 'user',

        password : 'pass',

        database : 'test'

    });

    return connection;

}



for (var i = 0; i < 2; i++) { testTransaction(conn(),i); }

--> or promise pattern
transaction1.then(transaction2);

mysql multiple-row insert, single-row insert, or transaction insert



http://stackoverflow.com/questions/1793169/which-is-faster-multiple-single-inserts-or-one-multiple-row-insert



"""...

Which is faster: multiple single INSERTs or one multiple-row INSERT?



...

use INSERT statements with multiple VALUES lists to insert several rows at a time. This is considerably faster (many times faster in some cases) than using separate single-row INSERT statements.



...

in general, multiple inserts will be slower because of the connection overhead. Doing multiple inserts at once will reduce the cost of overhead per insert.





..."""





INSERT INTO tablename (col1, col2) VALUES (1, 2), (3, 4), (5, 6), (7, 8)

Which will insert 4 rows.







//=== https://dev.mysql.com/doc/refman/5.0/en/insert-speed.html

"""...

The time required for inserting a row is determined by the following factors, where the numbers indicate approximate proportions:



Connecting: (3)



Sending query to server: (2)



Parsing query: (2)



Inserting row: (1 × size of row)



Inserting indexes: (1 × number of indexes)



Closing: (1)

..."""







//=== https://github.com/felixge/node-mysql/

"""...

connection.beginTransaction(function(err) {

  if (err) { throw err; }

  connection.query('INSERT INTO posts SET title=?', title, function(err, result) {

    if (err) { 

      connection.rollback(function() {

        throw err;

      });

    }



    var log = 'Post ' + result.insertId + ' added';



    connection.query('INSERT INTO log SET data=?', log, function(err, result) {

      if (err) { 

        connection.rollback(function() {

          throw err;

        });

      }  

      connection.commit(function(err) {

        if (err) { 

          connection.rollback(function() {

            throw err;

          });





.."""

grep to find word in files



$ man grep



-r search recursively if the given path is a directory

-i case-insensitive search

-n print the line number 



-H print the filename [implied when multiple files are searched, i.e. no need to add -H 

when the given path stands for mutliple files]



-I ignore binary files (complement: -a treat all files as text)

-F treat search term as a literal, not a regular expression





--exclude=GLOB

              Skip  files  whose  base  name  matches  GLOB  (using   wildcard

              matching).   A  file-name  glob  can  use  *,  ?,  and [...]  as

              wildcards, and \ to quote  a  wildcard  or  backslash  character

              literally.



--exclude-from=FILE

              Skip  files  whose  base name matches any of the file-name globs

              read from FILE  (using  wildcard  matching  as  described  under

              --exclude).



--exclude-dir=DIR  

Exclude  directories  matching  the  pattern  DIR from recursive searches.

[ e.g, to exlcude  .svn and .git 

xxx $ grep -r "non-sense" --exclude-dir='\.svn|\.git' (regex not supported for --exclude-dir ??)

--> $ grep -r "non-sense" --exclude-dir=.svn --exclude-dir=.git



http://superuser.com/questions/66715/recursive-grep-exclude-specific-directories



http://stackoverflow.com/questions/19962645/bash-function-grep-exclude-dir-not-working

the version of grep may not be able to use --exclude-dirs.



]





-E, --extended-regexp

 -e PATTERN, --regexp=PATTERN

              Use  PATTERN  as  the  pattern.   This  can  be  used to specify

              multiple search patterns, or to protect a pattern beginning with

              a hyphen (-).  (-e is specified by POSIX.)







 -R, --dereference-recursive

              Read all files under each directory,  recursively.   Follow  all

              symbolic links, unlike -r.









-w, --word-regexp [to find a string that is a separate word (enclosed by spaces)]

              Select  only  those  lines  containing  matches  that form whole

              words.  The test is that the matching substring must  either  be

              at  the  beginning  of  the  line,  or  preceded  by  a non-word

              constituent character.  Similarly, it must be either at the  end

              of  the  line  or  followed by a non-word constituent character.

              Word-constituent  characters  are  letters,  digits,   and   the

              underscore.





 -L, --files-without-match [only print filenames that are not matched]

              Suppress  normal  output;  instead  print the name of each input

              file from which no output would normally have been printed.  The

              scanning will stop on the first match.



 -l, --files-with-matches [only print the matching filenames]

              Suppress  normal  output;  instead  print the name of each input

              file from which output would normally have  been  printed.   The

              scanning  will  stop  on  the  first match.  (-l is specified by

              POSIX.)





[examples]

$ grep -r word_to_find . 

[to search "word_to_find" recursively under  the current directory ]





$ grep -n  "> tmp1.txt" *.sh

[to find string "> tmp1.txt" in all files with extension name ".sh" in the current folder, not recursively]



xxx often mistake

$ grep myutil.js .

grep: .: Is a directory

xxx



-->

$ grep -r  myutil.js .

$ grep myutil.js  *.js

$ grep "myutil.js"  *.js



$ grep *.js *.js

$ grep  ".*js"   *.js  

[here, ".*js" stands for a regex in bash]



$ grep -l  "> tmp1.txt" *.sh

[list all the names of files which contain "> tmp1.txt" ]



$ grep  -r "myutil.js" . --exclude-dir=.git

$ grep   -r "myutil.js" . --exclude-dir=?git 

[though works as well, it means to exclude all subfolders whose name is even like agit, bgit, mgit, .git,  etc. ]







[ref]

http://tldp.org/LDP/Bash-Beginners-Guide/html/sect_04_02.html



bash regex

http://tldp.org/LDP/Bash-Beginners-Guide/html/sect_04_01.html

http://tldp.org/LDP/abs/html/x17129.html



http://ubuntu.elfscript.com/2013/09/ubuntu.html

2015年6月15日星期一

mongo, mongodb client shell, query





//=== http://docs.mongodb.org/manual/tutorial/getting-started-with-the-mongo-shell/

$ mongo

> db

> use another_db

> db.collectionName.find();



//=== http://docs.mongodb.org/v2.2/reference/method/db.collection.find/



db.products.find( { qty: { $gt: 25 } }, { item: 1, qty: 1 } )



This operation returns all the documents from the collection "products" 

where "qty" is greater than 25. 

The documents in the results y include the 

"_id", "item", and "qty"  fields 



using “inclusion” projection by the parameter  { item: 1, qty: 1 }



find() always returns the _id field, even when not explicitly included:



Results:

{ "_id" : 11, "item" : "pencil", "qty" : 50 }

{ "_id" : ObjectId("50634d86be4617f17bb159cd"), "item" : "bottle", "qty" : 30 }

{ "_id" : ObjectId("50634dbcbe4617f17bb159d0"), "item" : "paper", "qty" : 100}



"exlcude" projection

db.products.find( { qty: { $gt: 25 } }, { _id: 0, qty: 0 } )



Results:

{ "item" : "pencil", "type" : "no.2" }

{ "item" : "bottle", "type" : "blue" }

{ "item" : "paper" }







//=== http://docs.mongodb.org/manual/tutorial/write-scripts-for-the-mongo-shell/



*** mongo client shell

$ mongo

> db = connect("localhost:27017/db1");

> show collections;

> db.collect1.insert({k1:'v1',k2:'v2', k3:v3});



> db=connect("db2");

> show collections;



Differences Between Interactive and Scripted mongo ?





To set the db global variable, use the getDB() method or the connect() method. You can assign the database reference to a variable other than db.



You cannot use any shell helper (e.g. use , show dbs, etc.) inside the JavaScript file because they are not valid JavaScript.





hell Helpers 

JavaScript Equivalents



show dbs, show databases 

db.adminCommand('listDatabases')



use  

db = db.getSiblingDB('')



show collections 

db.getCollectionNames()



show users 

db.getUsers()



show roles 

db.getRoles({showBuiltinRoles: true})



show log  

db.adminCommand({ 'getLog' : '' })



show logs 

db.adminCommand({ 'getLog' : '*' })







it 

cursor = db.collection.find()

if ( cursor.hasNext() ){

cursor.next();

}

mongodb adaptor for nodejs



//=== http://stackoverflow.com/questions/24736999/how-can-i-require-mongodb-within-casperjs



install mongodb driver for nodejs

$ npm install mongodb



* this mongodb module is just an adaptor for nodejs to access the actual mongodb server, 

you have to setup the mongodb server beside the module.







//=== https://mongodb.github.io/node-mongodb-native/api-generated/mongoclient.html



MongoClient class's  instance member functions: 

connect(), open(), close(), db()



connect(url[, options], callback)

open(callback)

close(callback)

db(dbName)







"""...

 var Db = require('mongodb').Db,

    MongoClient = require('mongodb').MongoClient,

    Server = require('mongodb').Server,

    ReplSetServers = require('mongodb').ReplSetServers,

    ObjectID = require('mongodb').ObjectID,

    Binary = require('mongodb').Binary,

    GridStore = require('mongodb').GridStore,

    Grid = require('mongodb').Grid,

    Code = require('mongodb').Code,

    BSON = require('mongodb').pure().BSON,

    assert = require('assert');



  // Set up the connection to the local db

  var mongoclient = new MongoClient(new Server("localhost", 27017), {native_parser: true});



  // Open the connection to the server

  mongoclient.open(function(err, mongoclient) {



    // Get the first db and do an update document on it

    var db = mongoclient.db("integration_tests");

    db.collection('mongoclient_test').update({a:1}, {b:1}, {upsert:true}, function(err, result) {

      assert.equal(null, err);

      assert.equal(1, result);



      // Get another db and do an update document on it

      var db2 = mongoclient.db("integration_tests2");

      db2.collection('mongoclient_test').update({a:1}, {b:1}, {upsert:true}, function(err, result) {

        assert.equal(null, err);

        assert.equal(1, result);



        // Close the connection

        mongoclient.close();

      });

    });

  });



..."""





//=== static member function,  MongoClient.connect(url, cb)

Connect to MongoDB using a url as documented 





var MongoClient = require('mongodb').MongoClient;

var assert = require('assert')

var ObjectId = require('mongodb').ObjectID;

var url = 'mongodb://myMongo1:27017/test';





db.collection('restaurants1').insertOne();



MongoClient.connect(url, function(err, db){});

2015年6月10日星期三

nodejs and phantomjs



//=== phantomjs-node module for nodejs

https://github.com/sgentle/phantomjs-node

"""... 

This module expects the phantomjs binary to be in PATH somewhere. In other words, type this:



$ phantomjs

If that works, so will phantomjs-node.



..."""



0. verfiy that phantomjs binary already works by

$ phantomjs --version



1. then install phantomjs-node module for nodejs by

$ npm install phantom





[ex]

var phantom = require('phantom');



phantom.create(function (ph) {

  ph.createPage(function (page) {

    page.open("http://www.google.com", function (status) {

      console.log("opened google? ", status);

      page.evaluate(function () { return document.title; }, function (result) {

        console.log('Page title is ' + result);

        ph.exit();

      });

    });

  });

});







"""...

... How do you communicate with a process that doesn't support shared memory, sockets, FIFOs, or standard input?



Well, there's one thing PhantomJS does support, and that's opening webpages. In fact, it's really good at opening web pages. So we communicate with PhantomJS by spinning up an instance of ExpressJS, opening Phantom in a subprocess, and pointing it at a special webpage that turns socket.io messages into alert() calls. Those alert() calls are picked up by Phantom and there you go!



The communication itself happens via James Halliday's fantastic dnode library, which fortunately works well enough when combined with browserify to run straight out of PhantomJS's pidgin Javascript environment.



..."""







//=== phantomjs module for nodejs

https://www.npmjs.com/package/phantomjs

[ An NPM wrapper for PhantomJS ]





$ npm install phantomjs



Or grab the source and



$ node ./install.js



What this installer is really doing is just grabbing a particular "blessed" version of PhantomJS



"""...

PhantomJS is not a library for NodeJS. It's a separate environment and code written for node is unlikely to be compatible. In particular PhantomJS does not expose a Common JS package loader.



This is an 'NPM wrapper' and can be used to conveniently make Phantom available.

It is not a 'NodeJS wrapper'.



... drive from within a node program by spawning phantom in a child process.



... tell NPM and the PhantomJS installer to skip validation of ssl keys with NPM's strict-ssl setting:



$ npm set strict-ssl false







..."""





//=== node-phantom module for nodejs

https://www.npmjs.com/package/node-phantom

[ This is a bridge between PhantomJs and Node.js. ]



0. The bridge assumes that the "phantomjs" binary is available in the PATH.



1. The only other dependency is socket.io



*** need to manually install socket.io and node-phantom



[ex]

var phantom=require('node-phantom');

phantom.create(function(err,ph) {

  return ph.createPage(function(err,page) {





//=== phantomjs on nodejs-docker

[err] phantomjs: error while loading shared libraries: libjpeg.so.8: cannot open shared object file: No such file or directory

2015年6月1日星期一

how to transfer a docker image from one computer to another



http://stackoverflow.com/questions/23935141/how-to-copy-docker-images-from-one-host-to-another-without-via-repository

http://it.3hd.me/2015/03/docker-exportimport-pullcommit-saveload.html





//=== save/load



0. save the docker image as a tar file on pc1



$ docker save -o savefname.tar  imgname

or

$ docker save imgname > savefname.tar





1. scp savefname.tar to pc2 

[to save bandwidth, compress-> send-> decompress is recommended]



2. load savefname.tar to image on pc2

$ docker load < savefname.tar

or

$ docker load -i savefname.tar



 

3. verify 'imgname' is restored on pc2

$ docker images -a imgname







//=== export/import 

[operating on container(running or not), the resultant img will lose previous history and layers]



0. on pc1:

$ docker export container1 > c1.tar



1. copy c1.tar from pc1 to pc2



2. on pc2:

$ cat c1.tar | docker import - new_imgname:tagname









//=== http://unix.stackexchange.com/questions/70581/scp-and-compress-at-the-same-time-no-intermediate-save

[Q] does scp compress file implicitly ???



"""...

scp itself can compress the data



 -C      Compression enable. ...



$ scp -C source user@server:/path/to/backup





...

7z should not be used for backups on Linux. As stated on the 7z man page:



DO NOT USE the 7-zip format for backup purpose on Linux/Unix because :

- 7-zip does not store the owner/group of the file.





..."""





$ tar zcvf -  MyBackups | ssh user@server "cat > /path/to/backup/foo.tgz"

訂閱：文章 (Atom)

3H ITek Studio

免責聲明

2015年6月28日星期日

mysql connections parallel insert

2015年6月24日星期三

mysql consecutive inserts on one connection lead to race?

mysql multiple-row insert, single-row insert, or transaction insert

grep to find word in files

2015年6月15日星期一

mongo, mongodb client shell, query

mongodb adaptor for nodejs

2015年6月10日星期三

nodejs and phantomjs

2015年6月1日星期一

how to transfer a docker image from one computer to another

Haxe Links

SmartCard Infos

Rounded Corner

ThrashBox2

TrashBox Test

Office Ribbon Links

MSI Conditions Links

網誌存檔

關於我自己

免責聲明

2015年6月28日 星期日

2015年6月24日 星期三

2015年6月15日 星期一

2015年6月10日 星期三

2015年6月1日 星期一

Haxe Links

SmartCard Infos

Rounded Corner

ThrashBox2

TrashBox Test

Office Ribbon Links

MSI Conditions Links

網誌存檔

關於我自己

2015年6月28日星期日

2015年6月24日星期三

2015年6月15日星期一

2015年6月10日星期三

2015年6月1日星期一