Archivi tag: java

Kryo and FastUtil wrapper

While I was working with Apache Spark, I had a problem to write a FastUtil wrapper with Kryo Serializer. So I passed a day to working on it and I found a solution.
This is an example code that uses a Int2LongOpenHashMap wrapper that has public  long counter attribute.

WrappedInt2LongOpenHashMap p = new WrappedInt2LongOpenHashMap();
p.addTo(220, 20);
p.addTo(30, 5);
p.addTo(30, 15);
p.addTo(220, 5);
p.counter = 10;
p.inputName="prova";

System.out.println(p.counter);
System.out.println("---------");

The Java default serializer works fine, it serializes HashMap and the counter attribute. Continua a leggere

Bisecting KMeans on Hadoop without wasting disk memory

In the last months I with a university colleague, Gino Farisano, were working on Hadoop  KMeans project for documents clustering. This work was done for Advanced Operation Systems exam, under the supervision of prof. Cattaneo and ph.D. student Roscigno.
While we were working on the Bisecting KMeans, we had read that existing algorithms (at least, those we found), for each new cluster found, write its documents on a new HDFS directory. This memory disk wasting is a bit bad for the implementation and for the performance of the algorithm.

Luckily we have developed an algorithm which doesn’t write nothing new on HDFS.

Annotation: Working on documents is as woking on a sparse matrix, where every document is a raw and every word is a column. For the implementation we simply used an HashMap(Word, Value).

Hadoop Pseudo Code

For every Bisecting, two jobs are execute:

  1. The first job select two random documents from the biggest cluster
  2. The secondo job is the kmeans job, that can be executed #iterations times.

At the end of each Bisecting, we have to remove for “array list” the centroid which have more documents and to add these two new centroids. This is the pseudo code.

The input of the mapper (raw) is a pair <K, V>, where K is the document name and V is the list of pairs: <Word, Value>

As you can see in the Java code (it is at the end of page) we use the Document class, which it is used to merge documents and to normalize the new centroid found.

Mapper

Mapper(raw <K,V>):
indexNear = -1; distMin = inf;

//Get the nearest centroid to document
for Centroids as index => centroid {
  dist = distance(raw, centroid)
  if dist < lenMin {
    indexNear = index
    distMin = dist
  }
}
//biggerCentroid is read into setup
if (indexNear == biggerCentroid ) {
  indexNear = -1; distMin = inf;
  for RndCentroids as ind => centroid {
     dist = distance(raw, centroid)
     if dist < lenMin {
       indexNear = ind
       distMin = dist
     }
  }
  //indexNear is the random centroid nearest to document
  send(indexNear, raw)	
}

Combiner

Reducer(List<K,V> input):
Document aggreg = new Document(K);
For Document raw in input:
    aggreg.add(raw)
send(aggreg.key(); aggreg.value())

Reducer

Reducer(List<K,V> input):
Document aggreg = new Document(K);
for Document raw in input {
   Aggreg.add(raw)
}
aggreg.normalize()
send(aggreg.key(); aggreg.value())

So, the output of Reducer is the new centroid computed by Map-Reduce job.

Java Code

This is the sequential BisecKMeans  Java code. The Hadoop code will be release at the end of our university work.

 

Hadoop Code

Here there’s the code or what‘s left 😛

Come cifrare un oggetto java tramite JCA

In Java per il salvataggio di oggetti cifrati si può utilizzare la JCA: Java Cryptography Architecture, presente di default nell’ambiente Java.

Il JCA è stato progettato su “engine” criptografici e definendo delle classi che forniscono le funzionalità a questi engine.
Per i nostri scopi useremo l’engine Cipher insieme agli oggetti Sealed, utilizzati proprio per memorizzare oggetti cifrati.
Il cifrario usato è un AES con mode:CBC e padding:PKCS5Padding.

Continua a leggere

Handwritten Signature SDK

Handwritten Signature SDK was developed for a thesis in computer science at UniSa.
This project borns because of need to use a fairly complete and free SDK to work on handwritten signature.
SDK is based on signature-verification project, that implements DTW and ER2 algorithms, which I added other features read on scientific articles, as you can see:

  • Dynamic time warping with Sakoe-Chiba band.
  • Extended Regression in 2 dimensions.
  • Signature sampling by points coords and not on an image.
  • Directional Hash: generate an hash from signature according to X and Y movements.
  • Signature rotation by barycenter: this technique is to be test more, but it works if signatures are similar.
  • Methods to extract: velocity, internal and external angles and critical points.

Handwritten Signature SDK 0.2

 

Una soluzione al livelock per java

[English version]

Assay

Livelock: A thread often acts in response to the action of another thread. If the other thread’s action is also a response to the action of another thread, then livelock may result. As with deadlock, livelocked threads are unable to make further progress. However, the threads are not blocked — they are simply too busy responding to each other to resume work. This is comparable to two people attempting to pass each other in a corridor: Alphonse moves to his left to let Gaston pass, while Gaston moves to his right to let Alphonse pass. Seeing that they are still blocking each other, Alphone moves to his right, while Gaston moves to his left. They’re still blocking each other, so…

Fonte: Lessons: Concurrency

Java con le API di concorrenza offre un buon metodo per evitare il deadlock: l’utilizzo dei blocchi tramite l’interfaccia java.util.cuncurrent.locks e i metodi che l’implementano come ReentrantLock, mentre per il livelock in giro non è che si trovi molto (scrivo il 05/11/2011), quindi ho deciso di rilasciare un mia piccola soluzione a questo problema.

La soluzione che qui propongo si basa sull’utilizzo dell’interfaccia java.util.concurrent.ExecutorService e di un’ interfaccia creata da me LiveLockSolution con la quale mi ricordo di implementare un metodo contro il livelock, utile in quelle classi usate per la concorrenza dei dati,

public interface LiveLockSolution {
    public void liveLockSolution();
}

Ora vi mostro un esempio di una classe prodotto che utilizza i blocchi per evitare deadlock e che implemente LiveLockSolution

public class Prodotto implements LiveLockSolution{
    private int quantità;
    private boolean endLoop;
    private ReentrantLock lock;
    private Condition cond;

    public Prodotto(ReentrantLock lock) {
        this.endLoop = false;
        this.quantità = 0;
        this.lock = lock;
        this.cond = lock.newCondition();
    }

    /**
     * Produco un prodotto
     */
    public void produci() {
        lock.lock();
        try {
            this.quantità++;
            System.out.println("Q:" + this.quantità);
            cond.signalAll();
        } finally {
            lock.unlock();
        }        
    }

    /**
     * Consumo il prodotto, se questo non è disponibile allora attendo
     */
    public void consuma() {
        lock.lock();
        try {
            while(endLoop == false && quantità == 0) {
                cond.await();
            }
            if(endLoop == false) { //se son stato avvisato non faccio niente
                this.quantità--;
                System.out.println("Q:" + this.quantità);
            }
            cond.signalAll();
        } catch(Exception ex) {
            ex.printStackTrace();
        } finally {
            lock.unlock();
        }
    }

    /**
     * Faccio in modo da far terminare i vari thread che usano Prodotto e che stanno aspettando 
     */
    @Override
    public void liveLockSolution() {
        lock.lock();
        try {
            this.endLoop = true; //dico a consuma di non aspettare altra produzione
            cond.signalAll();
        } finally {
            lock.unlock();
        }
    }

}

Ora vi lascio immaginare come saranno le classi Produttore e Consumatore, e quindi vi mostro il main:

public static void main(String[] args) {
        try {
            ReentrantLock lock = new ReentrantLock();
            Prodotto obj = new Prodotto(lock);
            Consumatore cons = new Consumatore(obj);
            Consumatore cons2 = new Consumatore(obj);
            Produttore prod = new Produttore(obj);

            System.out.println("Inizio Concorrenza\n\n");

            //Uso l'API concorrente di java come alternativa ai thread
            ExecutorService es = Executors.newCachedThreadPool();
            //eseguo i vari thread
            es.execute(cons);
            es.execute(cons2);
            es.execute(prod);

            //faccio lavorare per un secondo
            Thread.sleep(1000); 

            //fermo tutti i thread
            prod.stop();
            cons.stop();
            cons2.stop();

            //non accetto più nuovi task
            es.shutdown(); 
            //aspetto la terminazione dei task nell'ExecutorService
            while(es.awaitTermination(100, TimeUnit.MILLISECONDS) == false) {
                System.out.println("Aspetto terminazione thread");
                obj.liveLockSolution(); //risolvo il livelock
            }
            System.out.println("Fine Concorrenza");

        } catch (InterruptedException e) {
            e.printStackTrace();
        } catch(Exception e) {
            e.printStackTrace();
        }

    }

Come potete vedere il metodo liveLockSolution viene eseguito ogni qual volta la funzione awaitTermination trova un thread attivo nel pool dei thread di ExecutorService.
Per provarlo basta scaricare il sorgente ed eseguirlo.

Poll Manager

PollManager è una suite di programmi multi-piattaforma per la gestione e la fruizione di sondaggi.
PollWriter è il componente dedicato alla gestione dei sondaggi: potete crearli, aprirli

ed analizzare le risposte date

PollAnswer è il componente che si utilizza per rispondere ai sondaggi.

I software sono stati sviluppati in java in modo da renderli multi-piattaforma.

EseguibileSorgente