免責聲明

Disclaimer (免責聲明)
繼續閱覽代表您接受以上的免責聲明.
To continue reading means you accept the above disclaimer.

2015年6月28日 星期日

mysql connections parallel insert



//=== node-mysql
https://github.com/felixge/node-mysql/
*** need multiple connections to execute queries in parallel.


//=== python, mysql
http://themattreid.com/wordpress/2010/08/30/easy-python-threading-mysql-connections/

"""...
You will run into performance problems with the Python threading module because of the GIL.
... Eventlet for non-blocking asynchronous calls (eventlet.GreenPool and similar) or
using the multiprocessing Python 2.6+ module.


..."""




"...
... single threaded database operations are simply too slow. In these cases it’s a matter of course that you’ll use multi-threading or forking to spawn secondary processes to handle the database actions.

...



import threading
import sys
import MySQLdb
import pdb

class threader(threading.Thread):
    def __init__(self,db):
        threading.Thread.__init__(self)
        self.db = db
    def run(self):
        run_insert(self.db)

def run_insert(db):
    sql = "INSERT INTO `test` (`id`,`col_a`,`col_b`,`col_c`) VALUES (NULL,'0','0','0');"
    print "thread executing sql:%s"%(sql)
    try:
        cursor = db.cursor()
        cursor.execute(sql)
        cursor.close()
        db.commit()
    except:
        print "insert failed"

    return

def init_thread(): 
    backgrounds = []
    for db in connections:
        print "connection: %s"%(db)
        background = threader(db)
        background.start()
        backgrounds.append(background)

    for background in backgrounds:
        background.join()


def main():
    try:
        init_thread()
    except:
        print "failed to initiate threads"
    sys.exit(0)




if __name__ == "__main__":
    ....
    threads = int(4) #quantity of execution threads and size of connection pool

    connections = []
    for thread in range(threads):
        try:
            connections.append(MySQLdb.connect(host=mysql_host, user=mysql_user, passwd=mysql_pass, db=mysql_db, port=mysql_port))
            #pdb.set_trace()

        except MySQLdb.Error, e:
            print "Error %d: %s"%(e.args[0], e.args[1])
            pdb.set_trace()            
            sys.exit (1)


    main()



..."""




//=== http://stackoverflow.com/questions/12919047/mysql-connector-pooling-with-many-async-inserts
C#, mysql connector, pool

"""...
It is suggested to not use a global MySqlConnection object ...
Further, it's suggested to use the MySqlHelper instead of working with MySqlCommand objects.

...

MySqlHelper.ExecuteNonQuery(CONNECTION_STRING,
@"INSERT INTO `table1` (`col3`,`col4`) VALUES (@col3, @col4 );",
(new List() {
new MySqlParameter("@col3", @"another value"),
new MySqlParameter("@col4", @"some value")
}).ToArray()
);
Now, imagine that I make the above call a few thousand times in async parallel tasks


added this in my connection string:
Pooling=True;minimumPoolSize=0;maximumpoolsize=100;

...
Yet, I still get max connections exceeded

need multiple connections to execute queries in parallel.
...
[on server side] In addition to max_connections, the max_user_connections is what is important here. It can be queried for:
show variables like 'max_user_connections';

...
To avoid exceeding max connections for your MySQL user you'll need to make sure the maximumpoolsize (in the connection string) is set to something lower than max_user_connections



..."""



*** python threading or multiprocessing ?
//=== http://stackoverflow.com/questions/3044580/multiprocessing-vs-threading-python

GIL: Global Interpreter Lock

"""...

... multiprocessing gets around the Global Interpreter Lock, but what other advantages are there

...
 The threading module uses threads, the multiprocessing uses processes. 
The difference is that threads run in the same memory space, 
while processes have separate memory. 
This makes it a bit harder to share objects between processes with multiprocessing. 
Since threads use the same memory, precautions have to be taken 
or two threads will write to the same memory at the same time. 
This is what the global interpreter lock is for.

Spawning processes is a bit slower than spawning threads. 
Once they are running, there is not much difference.


...
Threading's job is to enable applications to be responsive. Suppose you have a database connection and you need to respond to user input.
... By splitting off the database connection into a separate thread you can make the application more responsive. Also because both threads are in the same process, they can access the same data structures - good performance, plus a flexible software design.

Note that due to the GIL the app isn't actually doing two things at once, but what we've done is 
put the resource lock on the database into a separate thread so that 
CPU time can be switched between it and the user interaction. 
...


Multiprocessing is for times when you really do want more than one thing to be done at any given time. 
Suppose your application needs to connect to 6 databases and perform a complex matrix transformation on each dataset. 
Putting each job in a separate thread might help a little because when one connection is idle another one could get some CPU time, but the processing would not be done in parallel ... 

By putting each job in a Multiprocessing process, each can run on it's own CPU and run at full efficiency.




..."""


沒有留言:

張貼留言