3H ITek Studio: mysql connections parallel insert

2015年6月28日星期日

mysql connections parallel insert





//=== node-mysql

https://github.com/felixge/node-mysql/

*** need multiple connections to execute queries in parallel.





//=== python, mysql

http://themattreid.com/wordpress/2010/08/30/easy-python-threading-mysql-connections/



"""...

You will run into performance problems with the Python threading module because of the GIL. 

... Eventlet for non-blocking asynchronous calls (eventlet.GreenPool and similar) or 

using the multiprocessing Python 2.6+ module.





..."""









"...

... single threaded database operations are simply too slow. In these cases it’s a matter of course that you’ll use multi-threading or forking to spawn secondary processes to handle the database actions.



...

import threading
import sys
import MySQLdb
import pdb

class threader(threading.Thread):
    def __init__(self,db):
        threading.Thread.__init__(self)
        self.db = db
    def run(self):
        run_insert(self.db)

def run_insert(db):
    sql = "INSERT INTO `test` (`id`,`col_a`,`col_b`,`col_c`) VALUES (NULL,'0','0','0');"
    print "thread executing sql:%s"%(sql)
    try:
        cursor = db.cursor()
        cursor.execute(sql)
        cursor.close()
        db.commit()
    except:
        print "insert failed"

    return

def init_thread(): 
    backgrounds = []
    for db in connections:
        print "connection: %s"%(db)
        background = threader(db)
        background.start()
        backgrounds.append(background)

    for background in backgrounds:
        background.join()


def main():
    try:
        init_thread()
    except:
        print "failed to initiate threads"
    sys.exit(0)




if __name__ == "__main__":
    ....
    threads = int(4) #quantity of execution threads and size of connection pool

    connections = []
    for thread in range(threads):
        try:
            connections.append(MySQLdb.connect(host=mysql_host, user=mysql_user, passwd=mysql_pass, db=mysql_db, port=mysql_port))
            #pdb.set_trace()

        except MySQLdb.Error, e:
            print "Error %d: %s"%(e.args[0], e.args[1])
            pdb.set_trace()            
            sys.exit (1)


    main()





..."""









//=== http://stackoverflow.com/questions/12919047/mysql-connector-pooling-with-many-async-inserts

C#, mysql connector, pool



"""...

 It is suggested to not use a global MySqlConnection object ...

Further, it's suggested to use the MySqlHelper instead of working with MySqlCommand objects.



...



MySqlHelper.ExecuteNonQuery(CONNECTION_STRING,

    @"INSERT INTO `table1` (`col3`,`col4`) VALUES (@col3, @col4 );",

    (new List() {

                new MySqlParameter("@col3", @"another value"),

                new MySqlParameter("@col4", @"some value")    

            }).ToArray()

    );

Now, imagine that I make the above call a few thousand times in async parallel tasks





added this in my connection string:

Pooling=True;minimumPoolSize=0;maximumpoolsize=100;



...

Yet, I still get max connections exceeded



need multiple connections to execute queries in parallel.

...

[on server side] In addition to max_connections, the max_user_connections is what is important here. It can be queried for:

show variables like 'max_user_connections';



...

To avoid exceeding max connections for your MySQL user you'll need to make sure the maximumpoolsize (in the connection string) is set to something lower than max_user_connections







..."""

*** python threading or multiprocessing ?
//=== http://stackoverflow.com/questions/3044580/multiprocessing-vs-threading-python

GIL: Global Interpreter Lock

"""...

... multiprocessing gets around the Global Interpreter Lock, but what other advantages are there

...
The threading module uses threads, the multiprocessing uses processes.
The difference is that threads run in the same memory space,
while processes have separate memory.
This makes it a bit harder to share objects between processes with multiprocessing.
Since threads use the same memory, precautions have to be taken
or two threads will write to the same memory at the same time.
This is what the global interpreter lock is for.

Spawning processes is a bit slower than spawning threads.
Once they are running, there is not much difference.

...
Threading's job is to enable applications to be responsive. Suppose you have a database connection and you need to respond to user input.
... By splitting off the database connection into a separate thread you can make the application more responsive. Also because both threads are in the same process, they can access the same data structures - good performance, plus a flexible software design.

Note that due to the GIL the app isn't actually doing two things at once, but what we've done is
put the resource lock on the database into a separate thread so that
CPU time can be switched between it and the user interaction.
...

Multiprocessing is for times when you really do want more than one thing to be done at any given time.
Suppose your application needs to connect to 6 databases and perform a complex matrix transformation on each dataset.
Putting each job in a separate thread might help a little because when one connection is idle another one could get some CPU time, but the processing would not be done in parallel ...

By putting each job in a Multiprocessing process, each can run on it's own CPU and run at full efficiency.

..."""

3H ITek Studio

免責聲明

2015年6月28日星期日

mysql connections parallel insert

沒有留言:

張貼留言

Haxe Links

SmartCard Infos

Rounded Corner

ThrashBox2

TrashBox Test

Office Ribbon Links

MSI Conditions Links

網誌存檔

關於我自己

免責聲明

2015年6月28日 星期日

mysql connections parallel insert

沒有留言:

張貼留言

Haxe Links

SmartCard Infos

Rounded Corner

ThrashBox2

TrashBox Test

Office Ribbon Links

MSI Conditions Links

網誌存檔

關於我自己

2015年6月28日星期日