Tuesday, December 18, 2012

Ehcache or Memcached ?



Should we use memcached or should we go with Java based ehcache? It is not a million dollar question, but at least qualifies as a 100 dollar question :)
I think ehcache be a good option if it is used as an embedded cache in the application server, as it won’t involve sockets and serializations. Of course the application or application server must be in Java.
For all other modes, I think performance of ehcache will suffer as it will involve serialize/unserialize of data and data transfer will happen over sockets.  Then I doubt ehcache will be able perform better than memcached.

One major advantage of memcached  is that it has clients almost in every languages. The disadvantage is,  it doesn’t support persistence natively.  Membase (or couchbase) and memcachedb have data persistence  and they speak memcached protocol.  But do we need the memory cache to be persistent anyway ?

Ehcache supports HA. Then it cannot act as an embedded cache alone. 
HA in memcached can also be achieved very easily though generally the clients don’t support it directly. We need to just put the key-value pair in the memcached server that key is hashed to , and also in the next memcached server in the cluster (or also putting the key-value on a backup cluster of memcached servers).

Achieving elasticity with memcached looks like a big thing, because I don’t know if we can list all the keys in a memcached servers. There is the “stats” command for getting a list of keys, but no idea if it returns all the keys. If we can get the list of all keys we can use “extendable hashing” to shuffle few keys across the nodes. Only thing is clients also need to be made aware  of addition/deletion of new memcached servers in a cluster of memcached servers.  I am wondering how in a HA environment ehcache does that,  basically how the clients (especially the REST based non-Java clients) will know of the new destination for a key in cache of there is a change in number of ehcache nodes in the cluster.

I guess for embedded cache (for JAVA apps only) ehcache will be better.
In all other cases  memcached might  be better, but that is my thought only.

Monday, December 17, 2012

Simple program to list files sorted by size in a directory recursively

Below is a simple program to list files in a directory recursively and sort them by their sizes. I am using this tool in my work for some time as I want to detect huge files in some directories to free up disk spaces by deleting them. Hopefully it will be useful for you as well and hence I am posting it here :)   I am using boost filesystem library to do the real work.

#include <iostream>                                            
#include <iterator>                                            
#include <queue>                                               
#include <vector>                                              
#include <map>                                                 
#include <algorithm>                                           
#include <boost/filesystem.hpp>                                

using namespace std;
using namespace boost::filesystem;

int main(int argc, char *argv[])
{                               
    queue<path> directories;    
    multimap<uintmax_t, path> filesizes;

    if (argc < 2) {
        cout << "Usage: " << argv[1] <<  " path\n";
        return 1;                                  
    }                                              
    path p(argv[1]);                               

    try {
        if (exists(p)){
            if (is_regular_file(p)){
                cout << file_size(p) << "  " << p << endl;
                return 0;                                 
            }

            if (!is_directory(p)) {
                return 0;
            }
            directories.push(p);
            vector <path> entries;
            path thispath;
            while (!directories.empty()) {
                thispath = directories.front();
                directories.pop();
                copy(directory_iterator(thispath), directory_iterator(), back_inserter(entries));
                for (vector <path>::iterator it = entries.begin();
                        it != entries.end(); ++it) {
                    if (!exists(*it))
                        continue;
                    if (!is_symlink(*it) && is_directory(*it)) {
                        directories.push(*it);
                    } else if (is_regular(*it)) {
                        uintmax_t file_siz = file_size(*it);
                        filesizes.insert(pair<uintmax_t, path>(file_siz, *it));
                    }
                }
                entries.clear();
            }
        }
        for (multimap<uintmax_t, path>::iterator it1 = filesizes.begin(), it2 = filesizes.end();
                it1 != it2; ++it1) {
            cout << it1->first << " " << it1->second << endl;
        }
        filesizes.clear();
    } catch(const filesystem_error & ex) {
        cout << ex.what() << '\n';
    }
    return 0;
}



Tuesday, December 11, 2012

Apache Zookeeper examples for distributed configuration service, distributed registry, distributed coordination etc

Apache Zookeeper is an excellent piece of software that is really helpful for achieving the below stuffs in distributed systems:

  • coordination 
  • synchronization
  • lock service
  • configuration service
  • naming registry
etc. etc.
Zookeeper supports high availability by running multiple instances of the service. In case of one of the server that the clients are connecting to goes down, then it will switch over another server transparently. Great thing is irrespective of the server a client may connect to, it will see the updates in the same order. It is a high performance system and can be used for large distributed systems as well.
Zookeeper lets clients coordinate by a shared hierarchical namespace. It is a tree like structure as in normal file system. A client will be connected to a single zookeeper server (as long as the server is accessible).

Please go through Apache Zookeeper getting started guide for how to build, configure zookeeper first.
All these examples are available  in github.

Below is an example (zoo_create_node.c) for how to create Znodes (nodes in the zookeeper database).
In this example we demonstrate a simple way to create nodes in the zookeeper
server. Twenty nodes will be created with path /testpath0, /testpath1,     
/testpath2, /testpath3, ....., /testpath19.                                
All these nodes will be initialized to the same value "myvalue1".          
We will use zookeeper synchronus API to create the nodes. As soon as our   
client enters connected state, we start creating the nodes.                

All the examples used latest stable version of zookeeper that is version
3.3.6                                                                  
We may use the zookeeper client program to get the nodes and examine their
contents.                                                                
Suppose you have downloaded and extracted zookeeper to a directory       
/home/yourdir/packages/zookeeper-3.3.6 . Then after you build the C libraries
,they will be available at /home/yourdir/packages/zookeeper-3.3.6/src/c/.libs/
and the c command line tool will available at                                
/home/yourdir/packages/zookeeper-3.3.6/src/c. The command line tools are cli_st
and cli_mt and cli. They are convenient tool to examine zookeeper data.       

Compile the below code as shown below:
$gcc -o testzk1 zoo_create_node.c -I \
/home/yourdir/packages/zookeeper-3.3.6/src/c/include -I \
/home/yourdir/packages/zookeeper-3.3.6/src/c/generated -L \
/home/yourdir/packages/zookeeper-3.3.6/src/c/.libs/ -lzookeeper_mt
 *                                                               
Make sure that your LD_LIBRARY_PATH includes the zookeeper C libraries to run
the example. Before you run the example, you have to configure and run the  
zookeeper server. Please go through the zookeeper wiki how to do that.      
Now you run the example as shown below:                                     
./testzk1 127.0.0.1:22181  # Assuming zookeeper server is listening on port 
22181 and IP 127.0.0.1                                                      
Now use one of the cli tools to examine the znodes created and also their   
values.

#include <unistd.h>
#include <sys/select.h>
#include <sys/time.h>  
#include <time.h>      
#include <stdlib.h>    
#include <stdio.h>     
#include <string.h>    
#include <errno.h>     

#include <zookeeper.h>
static const char *hostPort;
static zhandle_t *zh;       
static clientid_t myid;     
static int connected;       

void watcher(zhandle_t *zzh, int type, int state, const char *path,
             void* context)                                        
{                                                                  
    if (type == ZOO_SESSION_EVENT) {                               
        if (state == ZOO_CONNECTED_STATE) {                        
            connected = 1;                                         
        } else if (state == ZOO_AUTH_FAILED_STATE) {               
            zookeeper_close(zzh);                                  
            exit(1);                                               
        } else if (state == ZOO_EXPIRED_SESSION_STATE) {           
            zookeeper_close(zzh);                                  
            exit(1);                                               
        }                                                          
    }                                                              
}                                                                  


int main(int argc, char *argv[])
{                               
    int rc;                     
    int fd;                     
    int interest;               
    int events;                 
    struct timeval tv;          
    fd_set rfds, wfds, efds;    

    if (argc != 2) {
        fprintf(stderr, "USAGE: %s host:port\n", argv[0]);
        exit(1);                                          
    }                                                     

    FD_ZERO(&rfds);
    FD_ZERO(&wfds);
    FD_ZERO(&efds);

    zoo_set_debug_level(ZOO_LOG_LEVEL_INFO);
    zoo_deterministic_conn_order(1);        
    hostPort = argv[1];                     
    int x = 0;                              
    zh = zookeeper_init(hostPort, watcher, 30000, &myid, 0, 0);
    if (!zh) {                                                 
        return errno;                                          
    }                                                          
    while (1) {                                                
        char mypath[255];                                      
        zookeeper_interest(zh, &fd, &interest, &tv);           
        usleep(10);                                            
        memset(mypath, 0, 255);                                
        if (connected) {                                       
            while (x < 20) {                                   
                sprintf(mypath, "/testpath%d", x);             
                usleep(10);                                    
                rc = zoo_create(zh, mypath, "myvalue1", 9, &ZOO_OPEN_ACL_UNSAFE, 0, 0, 0);
                if (rc){                                                                  
                    printf("Problems %s %d\n", mypath, rc);                               
                }                                                                         
                x++;                                                                      
            }                                                                             
            connected++;
        }
        if (fd != -1) {
            if (interest&ZOOKEEPER_READ) {
                FD_SET(fd, &rfds);
            } else {
                FD_CLR(fd, &rfds);
            }
            if (interest&ZOOKEEPER_WRITE) {
                FD_SET(fd, &wfds);
            } else {
                FD_CLR(fd, &wfds);
            }
        } else {
            fd = 0;
        }
        FD_SET(0, &rfds);
        rc = select(fd+1, &rfds, &wfds, &efds, &tv);
        events = 0;
        if (rc > 0) {
            if (FD_ISSET(fd, &rfds)) {
                    events |= ZOOKEEPER_READ;
            }
            if (FD_ISSET(fd, &wfds)) {
                events |= ZOOKEEPER_WRITE;
            }
        }
        zookeeper_process(zh, events);
        if (2 == connected ) {
            // We created the nodes, so we will exit now
            zookeeper_close(zh);
            break;
        }
    }
    return 0;
}


In the below example (zoo_get_node.c) we demonstrate a simple way to get nodes in the zookeeper
server. Twenty nodes with path /testpath0, /testpath1,                       
/testpath2, /testpath3, ....., /testpath19 will be examined and value        
associated with them will be printed.                                        
We will use zookeeper synchronus API to get the value of the nodes. As soon  
as our client enters connected state, we start getting the node values.      

Compile the code as shown below:
$gcc -o testzk1 zoo_get_node.c -I \
/home/yourdir/packages/zookeeper-3.3.6/src/c/include -I \
/home/yourdir/packages/zookeeper-3.3.6/src/c/generated -L \
/home/yourdir/packages/zookeeper-3.3.6/src/c/.libs/ -lzookeeper_mt
 *                                                               
Now you run the example as shown below:                          
./testzk1 127.0.0.1:22181  # Assuming zookeeper server is listening on port
22181 and IP 127.0.0.1                                                    

#include <unistd.h>
#include <sys/select.h>
#include <sys/time.h>  
#include <time.h>      
#include <stdlib.h>    
#include <stdio.h>     
#include <string.h>    
#include <errno.h>     

#include <zookeeper.h>
static const char *hostPort;
static zhandle_t *zh;       
static clientid_t myid;     
static int connected;       

void watcher(zhandle_t *zzh, int type, int state, const char *path,
             void* context)                                        
{                                                                  
    if (type == ZOO_SESSION_EVENT) {                               
        if (state == ZOO_CONNECTED_STATE) {                        
            connected = 1;                                         
        } else if (state == ZOO_AUTH_FAILED_STATE) {               
            zookeeper_close(zzh);                                  
            exit(1);                                               
        } else if (state == ZOO_EXPIRED_SESSION_STATE) {           
            zookeeper_close(zzh);                                  
            exit(1);                                               
        }                                                          
    }                                                              
}                                                                  

void watcherforwget(zhandle_t *zzh, int type, int state, const char *path,
             void* context)                                               
{                                                                         
    char *p = (char *)context;                                            
    if (type == ZOO_SESSION_EVENT) {                                      
        if (state == ZOO_CONNECTED_STATE) {                               
            connected = 1;                                                
        } else if (state == ZOO_AUTH_FAILED_STATE) {                      
            zookeeper_close(zzh);                                         
            exit(1);                                                      
        } else if (state == ZOO_EXPIRED_SESSION_STATE) {                  
            zookeeper_close(zzh);                                         
            exit(1);                                                      
        }                                                                 
    }                                                                     
    printf("Watcher context %s\n", p);                                    
}                                                                         

int main(int argc, char *argv[])
{                               
    int rc;                     
    int fd;                     
    int interest;               
    int events;                 
    struct timeval tv;          
    fd_set rfds, wfds, efds;    

    if (argc != 2) {
        fprintf(stderr, "USAGE: %s host:port\n", argv[0]);
        exit(1);                                          
    }                                                     

    FD_ZERO(&rfds);
    FD_ZERO(&wfds);
    FD_ZERO(&efds);

    zoo_set_debug_level(ZOO_LOG_LEVEL_INFO);
    zoo_deterministic_conn_order(1);        
    hostPort = argv[1];                     
    int x = 0;                              
    zh = zookeeper_init(hostPort, watcher, 30000, &myid, 0, 0);
    if (!zh) {                                                 
        return errno;                                          
    }                                                          
    while (1) {                                                
        char mypath[255];                                      
        char buffer[255];                                      
        struct Stat st;                                        
        zookeeper_interest(zh, &fd, &interest, &tv);           
        usleep(10);                                            
        memset(mypath, 0, 255);                                
        memset(buffer, 0, 255);                                
        if (connected) {                                       
            char mycontext[] = "This is context data for test";
            int len = 254;                                     
            while (x < 20) {                                   
                sprintf(mypath, "/testpath%d", x);             
                usleep(10);                                    
                rc = zoo_wget(zh, mypath, watcherforwget , mycontext, buffer, &len, &st);
                if (ZOK != rc){                                                          
                    printf("Problems %s %d\n", mypath, rc);                              
                } else if (len >= 0) {                                                   
                   buffer[len] = 0;                                                      
                   printf("Path: %s Data: %s\n", mypath, buffer);                        
                }                                                                        
                x++;                                                                     
                len = 254;                                                               
            }                                                                            
            connected++;
        }
        if (fd != -1) {
            if (interest&ZOOKEEPER_READ) {
                FD_SET(fd, &rfds);
            } else {
                FD_CLR(fd, &rfds);
            }
            if (interest&ZOOKEEPER_WRITE) {
                FD_SET(fd, &wfds);
            } else {
                FD_CLR(fd, &wfds);
            }
        } else {
            fd = 0;
        }
        FD_SET(0, &rfds);
        rc = select(fd+1, &rfds, &wfds, &efds, &tv);
        events = 0;
        if (rc > 0) {
            if (FD_ISSET(fd, &rfds)) {
                    events |= ZOOKEEPER_READ;
            }
            if (FD_ISSET(fd, &wfds)) {
                events |= ZOOKEEPER_WRITE;
            }
        }
        zookeeper_process(zh, events);
        if (2 == connected ) {
            // We created the nodes, so we will exit now
            zookeeper_close(zh);
            break;
        }
    }
    return 0;
}

In this example below (zoo_data_watches.c) we demonstrate a simple way to watch nodes in the zookeeper                                          
server. Twenty nodes with path /testpath0, /testpath1, /testpath2, /testpath3, .....,                                      
/testpath19 will be watches for any changes in their values.                                                               
We will use zookeeper synchronus API to watch the nodes. As soon                                                           
as our client enters connected state, we start putting watches for the nodes.                                              

Compile the below code as shown below:
$gcc -o testzk1 zoo_data_watches.c -I \
/home/yourdir/packages/zookeeper-3.3.6/src/c/include -I \
/home/yourdir/packages/zookeeper-3.3.6/src/c/generated -L \
/home/yourdir/packages/zookeeper-3.3.6/src/c/.libs/ -lzookeeper_mt
 *                                                               
Now you run the example as shown below:                          
./testzk1 127.0.0.1:22181  # Assuming zookeeper server is listening on port
22181 and IP 127.0.0.1                                    

#include <unistd.h>
#include <sys/select.h>
#include <sys/time.h>  
#include <time.h>      
#include <stdlib.h>    
#include <stdio.h>     
#include <string.h>    
#include <errno.h>     

#include <zookeeper.h>
static const char *hostPort;
static zhandle_t *zh;       
static clientid_t myid;     
static int connected;       
static char mycontext[] = "This is context data for test";

void watcher(zhandle_t *zzh, int type, int state, const char *path,
             void* context)                                        
{                                                                  
    if (type == ZOO_SESSION_EVENT) {                               
        if (state == ZOO_CONNECTED_STATE) {                        
            connected = 1;                                         
        } else if (state == ZOO_AUTH_FAILED_STATE) {               
            zookeeper_close(zzh);                                  
            exit(1);                                               
        } else if (state == ZOO_EXPIRED_SESSION_STATE) {           
            zookeeper_close(zzh);                                  
            exit(1);                                               
        }                                                          
    }                                                              
}                                                                  

void watcherforwget(zhandle_t *zzh, int type, int state, const char *path,
             void* context)                                               
{                                                                         
    char buffer[255];                                                     
    int len, rc;                                                          
    struct Stat st;                                                       
    char *p = (char *)context;                                            
    if (type == ZOO_SESSION_EVENT) {                                      
        if (state == ZOO_CONNECTED_STATE) {                               
            return;                                                       
        } else if (state == ZOO_AUTH_FAILED_STATE) {                      
            zookeeper_close(zzh);                                         
            exit(1);                                                      
        } else if (state == ZOO_EXPIRED_SESSION_STATE) {                  
            zookeeper_close(zzh);                                         
            exit(1);                                                      
        }                                                                 
    } else if (type == ZOO_CHANGED_EVENT) {                               
        printf("Data changed for %s \n", path);                           
        len = 254;                                                        
        //get the changed data and set an watch again                     
        rc = zoo_wget(zh, path, watcherforwget , mycontext, buffer, &len, &st);
        if (ZOK != rc){                                                        
            printf("Problems %s %d\n", path, rc);                              
        } else if (len >= 0) {                                                 
           buffer[len] = 0;                                                    
           printf("Path: %s changed data: %s\n", path, buffer);                
        }                                                                      
    }                                                                          

    printf("Watcher context %s\n", p);
}                                     

int main(int argc, char *argv[])
{                               
    int rc;                     
    int fd;                     
    int interest;               
    int events;                 
    struct timeval tv;          
    fd_set rfds, wfds, efds;    

    if (argc != 2) {
        fprintf(stderr, "USAGE: %s host:port\n", argv[0]);
        exit(1);                                          
    }                                                     

    FD_ZERO(&rfds);
    FD_ZERO(&wfds);
    FD_ZERO(&efds);

    zoo_set_debug_level(ZOO_LOG_LEVEL_INFO);
    zoo_deterministic_conn_order(1);        
    hostPort = argv[1];                     
    int x = 0;                              
    zh = zookeeper_init(hostPort, watcher, 30000, &myid, 0, 0);
    if (!zh) {                                                 
        return errno;                                          
    }                                                          
    while (1) {                                                
        char mypath[255];                                      
        char buffer[255];                                      
        struct Stat st;                                        
        zookeeper_interest(zh, &fd, &interest, &tv);           
        usleep(10);                                            
        memset(mypath, 0, 255);                                
        memset(buffer, 0, 255);                                
        if (connected) {                                       
            //Put the watches for the nodes                    
            int len = 254;                                     
            while (x < 20) {                                   
                sprintf(mypath, "/testpath%d", x);             
                usleep(10);                                    
                rc = zoo_wget(zh, mypath, watcherforwget , mycontext, buffer, &len, &st);
                if (ZOK != rc){                                                          
                    printf("Problems %s %d\n", mypath, rc);                              
                } else if (len >= 0) {                                                   
                   buffer[len] = 0;                                                      
                   printf("Path: %s Data: %s\n", mypath, buffer);
                }
                x++;
                len = 254;
            }
            connected++;
        }
        if (fd != -1) {
            if (interest&ZOOKEEPER_READ) {
                FD_SET(fd, &rfds);
            } else {
                FD_CLR(fd, &rfds);
            }
            if (interest&ZOOKEEPER_WRITE) {
                FD_SET(fd, &wfds);
            } else {
                FD_CLR(fd, &wfds);
            }
        } else {
            fd = 0;
        }
        FD_SET(0, &rfds);
        rc = select(fd+1, &rfds, &wfds, &efds, &tv);
        events = 0;
        if (rc > 0) {
            if (FD_ISSET(fd, &rfds)) {
                    events |= ZOOKEEPER_READ;
            }
            if (FD_ISSET(fd, &wfds)) {
                events |= ZOOKEEPER_WRITE;
            }
        }
        zookeeper_process(zh, events);
    }
    return 0;
}


In this example below (zoo_data_watches.c) we demonstrate a simple way to watch the appearance of a node
in the server. It will also watch if the node is deleted after it was created.

In this program we will watch for the appearance and deletion of a node
"/testforappearance". Once we create the node from another program, the watch
event will be sent to the client and the watcher routine woll be called.    
 *                                                                          

Compile the below code as shown below:
$gcc -o testzk1 zoo_exist_watch.c -I \
/home/yourdir/packages/zookeeper-3.3.6/src/c/include -I \
/home/yourdir/packages/zookeeper-3.3.6/src/c/generated -L \
/home/yourdir/packages/zookeeper-3.3.6/src/c/.libs/ -lzookeeper_mt
 *                                                               
./testzk1 127.0.0.1:22181  # Assuming zookeeper server is listening on port 22181 and
IP 127.0.0.1                                                                         

#include <unistd.h>
#include <sys/select.h>
#include <sys/time.h> 
#include <time.h>     
#include <stdlib.h>   
#include <stdio.h>    
#include <string.h>   
#include <errno.h>    

#include <zookeeper.h>
static const char *hostPort;
static zhandle_t *zh;      
static clientid_t myid;    
static int connected;      
static char mycontext[] = "This is context data for test";

void watcher(zhandle_t *zzh, int type, int state, const char *path,
             void* context)                                       
{                                                                 
    if (type == ZOO_SESSION_EVENT) {                              
        if (state == ZOO_CONNECTED_STATE) {                       
            connected = 1;                                        
        } else if (state == ZOO_AUTH_FAILED_STATE) {              
            zookeeper_close(zzh);                                 
            exit(1);                                              
        } else if (state == ZOO_EXPIRED_SESSION_STATE) {          
            zookeeper_close(zzh);                                 
            exit(1);                                              
        }                                                         
    }                                                             
}                                                                 

void watchexistence(zhandle_t *zzh, int type, int state, const char *path,
             void* context)                                              
{                                                                        
    static struct Stat st;                                               
    int rc;                                                              

    if (type == ZOO_SESSION_EVENT) {
        if (state == ZOO_CONNECTED_STATE) {
            return;                       
        } else if (state == ZOO_AUTH_FAILED_STATE) {
            zookeeper_close(zzh);                  
            exit(1);                               
        } else if (state == ZOO_EXPIRED_SESSION_STATE) {
            zookeeper_close(zzh);                      
            exit(1);                                   
        }                                              
    } else if (type == ZOO_CREATED_EVENT) {            
        printf("Node appeared %s, now Let us watch for its delete \n", path);
        rc = zoo_wexists(zh, path,                                          
                watchexistence , mycontext, &st);                           
        if (ZOK != rc){                                                     
            printf("Problems  %d\n", rc);                                   
        }                                                                   
    } else if (type == ZOO_DELETED_EVENT) {                                 
        printf("Node deleted %s, now Let us watch for its creation \n", path);
        rc = zoo_wexists(zh, path,                                           
                watchexistence , mycontext, &st);                            
        if (ZOK != rc){                                                      
            printf("Problems  %d\n", rc);                                    
        }                                                                    
    }                                                                        
}                                                                            

int main(int argc, char *argv[])
{                              
    int rc;                    
    int fd;                    
    int interest;              
    int events;                
    struct timeval tv;         
    fd_set rfds, wfds, efds;   

    if (argc != 2) {
        fprintf(stderr, "USAGE: %s host:port\n", argv[0]);
        exit(1);                                         
    }                                                    

    FD_ZERO(&rfds);
    FD_ZERO(&wfds);
    FD_ZERO(&efds);

    zoo_set_debug_level(ZOO_LOG_LEVEL_INFO);
    zoo_deterministic_conn_order(1);       
    hostPort = argv[1];                    

    zh = zookeeper_init(hostPort, watcher, 30000, &myid, 0, 0);
    if (!zh) {                                                
        return errno;                                         
    }                                                         

    while (1) {
        static struct Stat st;

        zookeeper_interest(zh, &fd, &interest, &tv);
        usleep(10);                                
        if (connected == 1) {                      
            // watch existence of the node         
            usleep(10);                            
            rc = zoo_wexists(zh, "/testforappearance",
                    watchexistence , mycontext, &st);
            if (ZOK != rc){
                printf("Problems  %d\n", rc);
            }
            connected++;
        }
        if (fd != -1) {
            if (interest & ZOOKEEPER_READ) {
                FD_SET(fd, &rfds);
            } else {
                FD_CLR(fd, &rfds);
            }
            if (interest & ZOOKEEPER_WRITE) {
                FD_SET(fd, &wfds);
            } else {
                FD_CLR(fd, &wfds);
            }
        } else {
            fd = 0;
        }
        FD_SET(0, &rfds);
        rc = select(fd+1, &rfds, &wfds, &efds, &tv);
        events = 0;
        if (rc > 0) {
            if (FD_ISSET(fd, &rfds)) {
                    events |= ZOOKEEPER_READ;
            }
            if (FD_ISSET(fd, &wfds)) {
                events |= ZOOKEEPER_WRITE;
            }
        }
        zookeeper_process(zh, events);
    }
    return 0;
}


In this example below(zoo_children_watch.c) we demonstrate a simple way to monitot the appearance of children znode in a path
in the server. We will check znode /testpath1 for its children and will examine if any children is
added or deleted under this path.

Compile the below code as shown below:
$gcc -o testzk1 zoo_childten_watch.c -I \
/home/yourdir/packages/zookeeper-3.3.6/src/c/include -I \
/home/yourdir/packages/zookeeper-3.3.6/src/c/generated -L \
/home/yourdir/packages/zookeeper-3.3.6/src/c/.libs/ -lzookeeper_mt
*
Make sure that your LD_LIBRARY_PATH includes the zookeeper C libraries to run the example. Before
you run the example, you have to configure and run the zookeeper server. Please go through the
zookeeper wiki how to do that. Now you run the example as shown below:
./testzk1 127.0.0.1:22181 # Assuming zookeeper server is listening on port 22181 and IP 127.0.0.1

#include <unistd.h>
#include <sys/select.h>
#include <sys/time.h>  
#include <time.h>      
#include <stdlib.h>    
#include <stdio.h>     
#include <string.h>    
#include <errno.h>     

#include <zookeeper.h>
static const char *hostPort;
static zhandle_t *zh;       
static clientid_t myid;     
static int connected;       
static char mycontext[] = "This is context data for test";

void watcher(zhandle_t *zzh, int type, int state, const char *path,
             void* context)                                        
{                                                                  
    if (type == ZOO_SESSION_EVENT) {                               
        if (state == ZOO_CONNECTED_STATE) {                        
            connected = 1;                                         
        } else if (state == ZOO_AUTH_FAILED_STATE) {               
            zookeeper_close(zzh);                                  
            exit(1);                                               
        } else if (state == ZOO_EXPIRED_SESSION_STATE) {           
            zookeeper_close(zzh);                                  
            exit(1);                                               
        }                                                          
    }                                                              
}                                                                  

void watchchildren(zhandle_t *zzh, int type, int state, const char *path,
             void* context)                                              
{                                                                        
    struct String_vector str;                                            
    int rc;                                                              

    printf("The event path %s, event type %d\n", path, type);
    if (type == ZOO_SESSION_EVENT) {                         
        if (state == ZOO_CONNECTED_STATE) {                  
            return;                                          
        } else if (state == ZOO_AUTH_FAILED_STATE) {         
            zookeeper_close(zzh);                            
            exit(1);                                         
        } else if (state == ZOO_EXPIRED_SESSION_STATE) {     
            zookeeper_close(zzh);                            
            exit(1);                                         
        }                                                    
    }                                                        
    // Put the watch again                                   
    rc = zoo_wget_children(zh, "/testpath1",                 
            watchchildren , mycontext, &str);                
    if (ZOK != rc){                                          
        printf("Problems  %d\n", rc);                        
    } else {                                                 
        int i = 0;                                           
        while (i < str.count) {                              
            printf("Children %s\n", str.data[i++]);          
        }                                                    
        if (str.count) {                                     
            deallocate_String_vector(&str);                  
        }                                                    
    }                                                        
}                                                            

int main(int argc, char *argv[])
{                               
    int rc;                     
    int fd;                     
    int interest;               
    int events;                 
    struct timeval tv;          
    fd_set rfds, wfds, efds;    

    if (argc != 2) {
        fprintf(stderr, "USAGE: %s host:port\n", argv[0]);
        exit(1);                                          
    }                                                     

    FD_ZERO(&rfds);
    FD_ZERO(&wfds);
    FD_ZERO(&efds);

    zoo_set_debug_level(ZOO_LOG_LEVEL_INFO);
    zoo_deterministic_conn_order(1);        
    hostPort = argv[1];                     

    zh = zookeeper_init(hostPort, watcher, 30000, &myid, 0, 0);
    if (!zh) {                                                 
        return errno;                                          
    }                                                          

    while (1) {
        zookeeper_interest(zh, &fd, &interest, &tv);
        usleep(10);                                 
        if (connected == 1) {                       
            struct String_vector str;               

            usleep(10);
            // watch existence of the node
            rc = zoo_wget_children(zh, "/testpath1", 
                    watchchildren , mycontext, &str);
            if (ZOK != rc){                          
                printf("Problems  %d\n", rc);        
            } else {                                 
                int i = 0;                           
                while (i < str.count) {              
                    printf("Children %s\n", str.data[i++]);
                }
                if (str.count) {
                    deallocate_String_vector(&str);
                }
            }
            connected++;
        }
        if (fd != -1) {
            if (interest & ZOOKEEPER_READ) {
                FD_SET(fd, &rfds);
            } else {
                FD_CLR(fd, &rfds);
            }
            if (interest & ZOOKEEPER_WRITE) {
                FD_SET(fd, &wfds);
            } else {
                FD_CLR(fd, &wfds);
            }
        } else {
            fd = 0;
        }
        FD_SET(0, &rfds);
        rc = select(fd+1, &rfds, &wfds, &efds, &tv);
        events = 0;
        if (rc > 0) {
            if (FD_ISSET(fd, &rfds)) {
                    events |= ZOOKEEPER_READ;
            }
            if (FD_ISSET(fd, &wfds)) {
                events |= ZOOKEEPER_WRITE;
            }
        }
        zookeeper_process(zh, events);
    }
    return 0;
}


Monday, November 12, 2012

Print all the permutations of the characters in a word

A interesting computer programming problem is to print the all permutations of a the characters in a word.

How can we do that? Below is a simple approach:

Suppose your word has 'n' chars,
then first letter of your word can have n options,
second n -1 options
third n -2 options
and so on....
last will have just one option.
Trick is to add the options successively to the prefix, and sending
the suffix (i.e.) the options for remaining positions to function
and call the function for printing permutations recursively.


Below is the code for printing permutations of a word. Here I am
not taking into account the case of repeated letters. Which results
in printing duplicates. More thoughts for avoiding printing duplicates
efficiently:). I don't want to use a set to store the words and printing them
at the end. I want an algorithmic way out to solve the issue. Also, in this example the maximum word size is taken as 26, but that can be easily changed; just replace 26 with a bigger number:)




// Below is the code for printing permutations of an word. Here I am
// not taking into account the case of repeated letters. Which results
// in printing duplicates. More thoughts for avoiding printing duplicates
// efficiently. I don't want to use a store the words and printing them
// at them at the end. I want an algorithmic way out to solve the issue
//
// How does it work ?
// Suppose your whord has 'n' chars,
// then first letter of your word can have n options,
// second n -1 options
// third n -2 options
// and so on....
// last will have just one option.
// Trick is to add the options successively to the prefix, and sending
// the suffix (i.e.) the options for remaining positions to function
// and call the function print_permutations_word repeatedly
//

#include <stdio.h>
#include <string.h>

int print_permutations_word(const char *prefix, const char *suffix, int suffixlen)
{
    char myprefix[26] = "";
    char mysuffix[26] = "";
    int i = 0, j = 0, k = 0;
    if (suffixlen == 0) {
        printf("word %s\n", prefix);
        return 0;
    }

    while (i < suffixlen) {
        memset(myprefix, 0, 26);
        memset(mysuffix, 0, 26);
        snprintf(myprefix,26,"%s%c", prefix,suffix[i]);
        j = 0;
        k = 0;
        while (j < suffixlen) {
           if (i != j){
                mysuffix[k++] = suffix[j];
           }
           j++;
        }
        i++;
        print_permutations_word(myprefix, mysuffix, suffixlen - 1);
    }
    return 0;

}

#if 0
//example run
int main()
{
print_permutations_word("", "abcde", 5);
return 0;
}
#endif





Next approach describes how we can print all permutations of the characters using iterative method based on the principle of getting the next permutation. The advantage of this approach is that we don't print duplicates and efficient for longer strings. We start with "smallest" word and successively print the next bigger word stopping at the "biggest" word. Heart of this approach is in getting the next permutation based on the current permutation.

# Given a word(current permutation), how do we find the next permutation?
# ------------------------------
# Start from last char and successively compare a char to the character in its
# left. If the left character is smaller than the current character (say the
# position of the left character is exchange position), exchange the left
# charecter with a charecter to its right which is bigger than it and
# nearest to it. When the exchange happens, sort the character array to the
# right of the exchange position.
# Repeat the above process until the iteration when no exchange of characters
# happens (i.e. it reaches the "biggest" word)

Below example describes the approach:

Current permutation “aerqpdcb”, it is represented in the array below:



a e r q p d c b


Start with rightmost char (b) and proced left to find the first character which is smaller than its right character.

So, we reached 'e' (exchange position is 1).
On the right side of e, we find p is the character which is bigger than e and nearerst to it.

Exchange, p and e and the array now becomes:


a p r q e d c b


Now, sort the array to the right of exchange position and the array becomes:


a p b c d e q r


So, next permutation after “aerqpdcb” is “apbcdeqr”


Below is the Python program (also available at github )  that
implements the approach:

# Below function demonstrates how we can print the permutations of the letters
# in a word

###############################################################################
# HOW it works?
# it starts with the "smallest" possible word and successively prints the next
# bigger word. For example, the smallest possible word from the chars in 
# word 'axprq' is 'apqrx' and then the next bigger word is 'apqxr' and so on.
# It stops after printing the "biggest" word which is 'xrqpa'.
#
# The advantage of this algorithm is that we don't print any duplicate words.
#
#
# Given a word(current permutation), how do we find the next permutation?
# ------------------------------ 
# Start from last char and successively compare a char to the character in its
# left. If the left character is smaller than the current character (say the 
# position of the left character is exchange position), exchange the left 
# charecter with a charecter to its right which is bigger than it and
# nearest to it. When the exchange happens, sort the character array to the 
# right of the exchange position.
# Repeat the above process until the iteration when no exchange of characters
# happens (i.e. it reaches the "biggest" word)

def print_permute(word):
    a = sorted(word)
    l = len(word)
    exchanged = True
    while exchanged:
        print ''.join(a)
        i = l - 1
        exchanged = False
        while i != 0:
            if a[i] <= a[i - 1]:
                i -= 1
            else:
                exchanged = True
                j = i + 1
                while j < l:
                    if a[i - 1] < a[j]:
                        j += 1 
                    else:
                        break
                j -= 1
                a[i-1], a[j] = a[j], a[i-1]
                a[i:] = sorted(a[i:])
                break

#Example Run
print ('Printing permutations of xyz')
print_permute('xyz')
print('.............................')
print('.............................')
print ('Printing permutations of abcdd')
print_permute('abcdd')
print('.............................')
print('.............................')
print ('Printing permutations of axprq')
print_permute('axprq')

Monday, November 5, 2012

Apache Thrift File based transport

Today I will demonstrate how we can use Apache Thrift FileTransport. Thrift file based transport is useful when we want to store the messages in some local files if the remote server is down. We may replay the files when the remote server comes up again. If we want to implement a Thrift based messaging system, then also this feature will be useful for "store and forwarding message" approach.

exception BadOperation {
1: i32 what,
2: optional string reason
}
enum Operation {
CREATE,
DELETE,
FILECONTENT,
DIRCONTENT
}
struct Work {
1: Operation op
2: string filename
3: string data
4: string rootdir
}
service FileService {
i32 createFile(1:Work w) throws (1:BadOperation badop),
list&lt;string> getFiles(1:Work w) throws (1:BadOperation badop)
}

Please refer my previous post or google for how to use the thrift compiler to generate code for handling thrift messages following the above definitions.My example will be in C++, but a PHP, Java or Python example would look very much similar.The generated code has a file FileService.cpp. Examining the code will show that createFileinterface calls two member functions of FileServiceClient class. They aresend_createFile and recv_createFile. send_createFile actually serialize our data,and send them over the socket in case of a typical client-server scenario.recv_createFile processes the response back from the server. In case we are using File based transport to "send" data, then there won't be any response. Hence, we don't need to call recv_createFile function. In stead we just call send_createFile to "send" or write the messages to the file. This is the trick.Same is the case with writing "getFiles" messages to the file as well.Actually we can make this much more efficient by serializing and batching the writes to the file ourselves. But here I am not showing that, as my purpose is to demonstrate the basic operations of File based transport.
Below I have pasted my code for "writing" messages to the file (file_writer.cpp)
#include <stdlib.h>
#include <time.h>  
#include <iostream>
#include <sstream> 
#include <protocol/TBinaryProtocol.h>
#include <transport/TSocket.h>       
#include <transport/TTransportUtils.h>

#include "FileService.h"

using std::cout;
using std::endl;
using std::stringstream;
using namespace apache::thrift;
using namespace apache::thrift::protocol;
using namespace apache::thrift::transport;

using namespace FileHandler;

using namespace boost;

int main(int argc, char** argv) {
    shared_ptr<TTransport> file(new TFileTransport("testfiletransport"));
    shared_ptr<TTransport> transport(new TBufferedTransport(file));      
    shared_ptr<TProtocol> protocol(new TBinaryProtocol(transport));      
    FileServiceClient client(protocol);                                  
    try {                                                                
        int i = 0;                                                       
        stringstream ss (stringstream::in | stringstream::out);          
        Work work;
        work.data = "mydata";
        work.rootdir = "/home/nipun/test";

        try {
            while (i++ < 100) {
                work.op = Operation::CREATE;
                ss << "filename" << i ;
                work.filename = ss.str();
                ss.clear();
                ss.str("");
                ss << "data" << rand() << "-" << rand() << time(0);
                work.data = ss.str();
                ss.clear();
                ss.str("");
                client.send_createFile(work);

                work.op = Operation::DIRCONTENT;
                ss << "filename" << i  << rand();
                work.filename = ss.str();
                ss.clear();
                ss.str("");
                ss << "data" << rand() << "-" << rand() << time(0) << rand();
                work.data = ss.str();
                ss.clear();
                ss.str("");
                client.send_getFiles(work);
            }
        } catch (BadOperation &op) {
            cout << "Exception "  <<  op.reason  << endl;
        }

    } catch (TException &tx) {
        cout <<  "ERROR: " << tx.what() << endl;
    }
}


Here we wrote "createFile" and "getFiles" messages 100 times each to the file "testfiletransport" in current directory. We used send_createFiles and send_getFiles routines for the same. Remember that we have to use exact similar type of transport while reading from the file also. TBinary protocol specifies how the data is serialized and TBuffered transport is used to buffer data before they are flushed to underlying transport which is the file "testfiletransport" here.

Now is the time to read the data from the file. Generally we write the data to a file and read it later when want to replay them later. In such cases the messages are read and send over another transport which may be over TSocket (socket based transport class defined in Thrift library). But in this example we will just read the messages and print them to standard out.
 

Code for file_reader.cpp

#include <stdlib.h>
#include <time.h>  
#include <iostream>
#include <sstream> 
#include <string>  
#include <protocol/TBinaryProtocol.h>
#include <transport/TSocket.h>       
#include <transport/TTransportUtils.h>

#include "FileService.h"

using std::cout;
using std::endl;
using std::stringstream;
using std::string;      
using namespace apache::thrift;
using namespace apache::thrift::protocol;
using namespace apache::thrift::transport;

using namespace FileHandler;

using namespace boost;

int main(int argc, char** argv) {
    shared_ptr<TTransport> file(new TFileTransport("testfiletransport", true));
    shared_ptr<TTransport> transport(new TBufferedTransport(file));
    shared_ptr<TProtocol> protocol(new TBinaryProtocol(transport));
    FileServiceClient client(protocol);
    string fname;
    TMessageType mtype;
    int32_t seqid;
    Work w;
    try {
        while (true) {
            protocol->readMessageBegin(fname, mtype, seqid);
            cout << fname << endl;
            if (fname == "createFile") {
                FileService_createFile_args args;
                args.read(protocol.get());
                protocol->readMessageEnd();
                w = args.w;
            } else if (fname == "getFiles") {
                FileService_getFiles_args args;
                args.read(protocol.get());
                protocol->readMessageEnd();
                w = args.w;
            }
            cout <<"\tData:\t" << w.data << endl;
            cout <<"\tFilename:\t" << w.filename << endl;
            cout <<"\tRootdir:\t" << w.rootdir << endl;
            cout <<"\tOperation:\t" << w.op << endl;

        }
    } catch (TTransportException& e) {
        if (e.getType() == TTransportException::END_OF_FILE) {
            cout << "\n\n\t\tRead All Data successfully\n";
        }
    } catch (TException &tx) {
        cout << "ERROR " <<  tx.what() << endl;
    }
}


In the file_reader.cpp example, we first read the message type using readMessageBegin interface of binary protocol.  Then we use "fname" (function name) field to decide if the message is a "createFile" or "getFiles" message. Then we use use appropriate FileService args class to read the actual message.
After that we just print the message on our terminal :)

Hope this will be useful for you and if so please don't forget to leave a comment here :)

In my next post, I will show how we may build a simple messaging system using Apache Thrift. Hopefully you will visit again.


Saturday, October 13, 2012

Distributed scalable log aggreation with Scribe

Today I will explain how to configure scribe log server and how to aggregate log from different location to a central location.
Scribe uses boost library and Apache Thrift. Both scribe and Apache Thrift  were developed by Facebook and were open sourced. Latest release of scribe was 3 years ago and hence it may not build successfully with latest g++ compiler. I am using 4.4.4 version of g++ and I am building on Linux (CentOS 6.0 64 bit) platform and hence all my examples are for Linux platform only.
I am using boost library (Boost version 1.46.1). Problem is scribe won't build with this library also and hence we will need to change few cpp files in scribe to build it successfully.

We download and extract Boost archive and go to the directory where it was extracted. Then issue the below commands:

$ sh bootstrap.sh --prefix=/usr/local
$ ./bjam install

It will boost libraries and header files under /usr/local/include and /usr/local/lib.
Then download thrift and build it. Thrift will need libevent libarary.

$rpm -qa | grep libevent-devel

If the above command doesn't return anything, then we execute the command below:

$yum install libevent-devel

We have to do apt-get on debian/ubuntu.

Now extract the thrift archive and build it.
Cd to the directory where you extracted the thrift sources.
$ ./configure
$  make
$  make install

Generally, thrift will install in /usr/local directory. If you had installed maven , it will build and install JAVA jars as well in /usr/local/lib.

It is better to add the below headers in the /usr/local/incude/thrift/Thrift.h. Reason being, I was getting few compilation errors such as uint32_t/int32_t not being recognized; htons,ntohl etc. being reported as unknown functions etc. etc.

#include <stdint.h>
#include <inttypes.h>
#include <arpa/inet.h>

Now thrift is there for us. We built boost already.

Building scribe
Now, it's time to compile scribe. I downloaded scribe-2.2.tar.gz and  I explain building for this version only.
Extract the scribe archive and cd to the directory where you extracted the archive. Now issue the below commands:

$export LD_LIBRARY_PATH=

 In my case both of them were installed in /usr/local/lib. So, I issued the below command:

$export LD_LIBRARY_PATH=/usr/local/lib

$sh bootstrap.sh --with-boost=. In my case boost headers/libraries installed in /usr/local. So, I issued the below command:
$sh bootstrap.sh --with-boost=/usr/local
$make

I got compilation error for conflicting return type for virtual scribe::thrift::ResultCode scribeHandler::Log in scribe_server.h. I solved that by looking at the type declared in scribe.h and following the same for scribe_server.h , i.e., I changed return type of scribeHandler::Log to scribe::thrift::ResultCode::type.

Then I got the below error in another source file file.cpp.
file.cpp:203: error: ‘class boost::filesystem3::directory_entry’ has no member named ‘filename’

This was due to the higher version of boost library I am using. So, I made it compatible by replacing the below line in file.cpp 
 _return.push_back(dir_iter->filename());
with 
_return.push_back(dir_iter->path().filename().string()); 
  
Till there are more compilation errors :). This time the error is in conn_pool.cpp and it says TRY_LATER and OK are not declared (:
So, we replace occurrences of TRY_LATER with ResultCode::TRY_LATER and OK with ResultCode::OK. Also, we have to replace "ResulCode result" with "ResultCode::type result".

Now the compilation error scribe_server.cpp is resolved by replacing "scribe::thrift::ResultCode scribeHandler::Log" with scribe::thrift::ResultCode::type scribeHandler::Log. Fix for ResultCode related compilation issues are fixed in the same way we did for conn_pool.cpp.
Finally, everything compiled successfully and hence we can install scribe.

$make install

Now we can run scribe! Running scribe is simple because the required configuration file is not complex to understand and the scribe package already have some good examples.

In the directory "if" under the root scribe directory (where we extracted scribe.tar.gz) there is the thrift idl file scribe.thrift. This file describes the Log service and also the structure of the log messages that can be sent to scribe log server.
Below is the content of the scribe.thrift:

include "fb303/if/fb303.thrift"

namespace cpp scribe.thrift

enum ResultCode
{
  OK,
  TRY_LATER
}

struct LogEntry
{
  1:  string category,
  2:  string message
}

service scribe extends fb303.FacebookService
{
  ResultCode Log(1: list messages);
}


This shows that each messages are to be sent via the "Log" routine defined by scribe service and it takes a vector of LogEntry Messages as input. LogEntry message has two fields, category and message and both are strings.  By default scribe logger creates a separate directory for each category of messages (unless we explicitly configure scribe not to do so).  This file also include the file fb303/if/fb303.thrift. This was installed on /usr/local/share/fb303/if/fb303.thrift on my system. So, lets generate the cpp files by using thrift compiler.

$thrift -I /usr/local/share -gen cpp   scribe.thrift

This will create gen-cpp directory.

$ls gen-cpp
scribe_constants.cpp  scribe_constants.h  scribe.cpp  scribe.h  scribe_server.skeleton.cpp  scribe_types.cpp  scribe_types.h

We have all the files except the client executable which will call the Log routine and send the messages to the scribe server(s).
Below is the simple client code (save the code in client.cpp).
#include <stdio.h>            
#include <unistd.h>           
#include <sys/time.h>         

#include <iostream>
#include <sstream> 

#include <protocol/TBinaryProtocol.h>
#include <transport/TSocket.h>       
#include <transport/TTransportUtils.h>

#include "scribe.h"

using std::cout;
using std::endl;
using std::vector;
using boost::shared_ptr;
using std::stringstream;

using namespace apache::thrift;
using namespace apache::thrift::protocol;
using namespace apache::thrift::transport;
using namespace scribe;                   
using namespace scribe::thrift;           


int main(int argc, char** argv) {

    if (argc != 3) {
        cout << "Usage: " << argv[0] << "  host " <<  "port" << endl;
        exit(1);                                                                                   
    }                                                                                              

    shared_ptr<TTransport> socket(new TSocket(argv[1], atoi(argv[2])));
    shared_ptr<TTransport> transport(new TFramedTransport(socket));
    shared_ptr<TProtocol> protocol(new TBinaryProtocol(transport));

    // Cretae a scribe client
    scribeClient client(protocol);
    // Vector of Logentry
    vector<LogEntry> logdata;

    try {
        stringstream ss;
        try {
            transport->open();
            int i = 0;
            LogEntry a ;
            a.category = "MyCat";

            // We will send a 1000 messages to the scribe log server
            while ( i++ < 1000 ) {
                ss << "Hello " << i ;
                a.message = ss.str();
                ss.str("");
                ss.clear();
                logdata.push_back(a);
            }
            client.Log(logdata);
            logdata.clear();
        } catch (...) {
            cout << "Got some exception " << endl;
        }
        transport->close();
    } catch (TException &tex) {
        printf("Exception: %s\n", tex.what());
    }
}


Now we do the following:
1. Compile the client programs as shown below:

$g++ -g  -o scribe_client -I . -I /usr/local/include/thrift -I /usr/local/include/thrift/fb303 client.cpp scribe_types.cpp  scribe_constants.cpp  scribe.cpp     -L /usr/local/lib -lthriftnb -lthrift -levent  /usr/local/lib/libfb303.a
Note that we provided the include directory location, and library locations for thrift,boost and facebook interface library (fb303).
We will run all the central logger, client logger and the client program in the same machine for demonstration.
 
2. Scribe central logger example configuration is present in examples directory of scribe source (example2central.conf). I modified it to add a newline after every log messages by adding "add_newlines=1"  in the section of the file. Log files will be created under some sub-directory in /tmp/scribetest. For our example, the su-directory is /tmp/scribetest/MyCat. We run the scribe central logger as shown below:

$scribed -c example2central.conf

3.  example2client.conf is configured to send logs to the central logger. the client logger listens on 1464 port for log messages and it forwards the messages to the central logger. We run scribe client logger by issuing the below command:

$scribed -c example2client.conf

4. Now we run our client. It connects to the client logger and sends logs to the client logger who is listening on port 1464. We run the client program:
$./scribe_client 127.0.0.1 1464

Now we can see our log messages in files in /tmp/scribetest/MyCat directory.

How do we write a client in java? It is simple. Follow the steps given below:
$thrift -I /usr/local/share -gen java   scribe.thrift
This generates the below files under gen-java:
  • LogEntry.java  
  • ResultCode.java  
  • scribe.java
We have to build libfb303-0.8.0.jar if it is not done already.  Cd to thrift-0.8.0/contrib/fb303/java and issue "ant "  build command. Basically we may need the below jars (which are part of thrift package or generated while building thrift).
  • commons-codec-1.4.jar
  • commons-lang-2.5.jar
  • commons-logging-1.1.1.jar
  • httpclient-4.1.2.jar
  • httpcore-4.1.3.jar
  • junit-4.4.jar
  • libfb303-0.8.0.jar
  • libthrift-0.8.0.jar
  • log4j-1.2.14.jar
  • servlet-api-2.5.jar
  • slf4j-api-1.5.8.jar
  • slf4j-log4j12-1.5.8.jar  
We need to write few lines of code to send messages to scribe logger though which will basically calls the "Log" routine defined in scribe.java.
Below is the code:

import org.apache.thrift.TException;
import org.apache.thrift.protocol.TBinaryProtocol;
import org.apache.thrift.protocol.TProtocol;      
import org.apache.thrift.transport.TSocket;       
import org.apache.thrift.transport.TTransport;    
import org.apache.thrift.transport.TFramedTransport;
import org.apache.thrift.transport.TTransportException;
import java.util.List;                                 
import java.util.ArrayList;                            

public class Main {

        public static void main(String[] args) {

                if (args.length != 2) {
                        System.out.println(" <Host> <Port> missing");
                        System.exit(1);
                }
                int port = -1;
                try {
                        port = Integer.parseInt(args[1]);
                } catch (NumberFormatException e) {
                        System.exit(1);
                }
                System.out.println(args[0]);
                System.out.println(args[1]);
                TTransport tr = new TFramedTransport(new TSocket(args[0], port));
                TProtocol proto = new TBinaryProtocol(tr);
                scribe.Client client = new scribe.Client(proto);

                int i = 0;
                List<LogEntry> list = new ArrayList<LogEntry>();
                LogEntry log = null;
                while (i < 100){
                        log = new LogEntry();
                        log.setCategory("javamessage");
                        log.setMessage("My Message " + i);
                        list.add(log);
                        i++;
                }
                try {
                        tr.open();
                        client.Log(list);
                } catch (org.apache.thrift.TException e){
                        e.printStackTrace();
                }
        }
}  

Save this in a file Main.java and compile this along with LogEntry.java, scribe.java,
ResultCode.java. While compiling and running, we have to put the jars listed above
in java classpath.