Using SQLite in a Threaded Java App


A few months ago I decided to port my WoW data importer app from PHP to Java because the website was already built in Java. I also wanted to see if execution time could be improved in a multi-threaded app.

The PHP script stores the JSON returned from the WoW API into individual JSON files for each item. This means there are thousands of JSON cache files. There’s a second script that packs these responses into a single file, and then a third script that imports the data from the consolidated file in the second script.

The reason for this two-step process was to see the difference in time it took my script to process the individual files vs a consolidated file, as describe in my blog post on optimizing the PHP importer script.

This approach was overcomplicated and pretty inefficient. While writing the Java app, I wanted to create a cache source that didn’t require another script to pack the data, but was could be updated efficiently.

Choosing SQLite

At first, I tried Oracle BerkleyDB. Coding that up was pretty simple, and I got it working. However, I couldn’t find a good program to manage the BerkleyDB. The apps I came across either wasn’t specifically for BerkleyDBs or they required compiling, and I didn’t want to bother with that.

Since I’ve used SQLite in quite a few projects, it seemed like the next choice. While the mySQL database I’m putting the data into would have worked fine, that database could be remote, and I needed a local database to cache the API results into.

Threading Gotchas

A big challenge with SQLite is that you’re working directly with a file, and not a service like you do with mySQL. Because of this, special precautions need to be taken so that you don’t run into SQLITE_BUSY exceptions or accidentally corrupt your database by writing in two threads at once. This happened at least once while I was playing around with different configurations.

Tip #1: Share the SQLite connection between threads

Creating a shared connection should at least avoid corrupting the database. However, it’s still possible to come across the SQLITE_BUSY exception

Tip #2: Use a Lock

Sharing the connection worked very well at first. I was able to import data into a clean database without any problem. However, when the app was run again and started updated cache, I started running into SQLITE_BUSY exceptions.

The easy solution was to implement a lock.

private static final ReentrantLock dbLock = new ReentrantLock(true);

protected DBLockClosable lockDB() {
	dbLock.lock();
	return new DBLockClosable();
}

protected class DBLockClosable implements AutoCloseable {
	@Override
	public void close() {
		dbLock.unlock();
	}
}

public void put(Integer id, String content) {
    try(DBLockClosable dblc = lockDB()) {
        _openDB();
        // Upsert record
        _closeDB();
    }
}

The above code represents the lock logic in my cache class. It uses a ReentrantLock so that only one thread is allowed to read or write to the database at one time. It also uses a DBLockClosable object that will release the lock once the DBLockClosable object is destroyed, which happened at the end of the try block. (Thanks to my coworker Colby for this implementation suggestion)

 

After putting this change in place, caching works fine, though the updates are slower, but we’re talking milliseconds. For the purpose of this app, that’s fine. That delay is rendered irrelevant by the wait to get responses back from the WoW API.

ReentrantLock vs ReentrantReadWriteLock

There is a locking strategy where you allow multiple reads, but when there is a write, there are no active reads and only one active write. Java has ReadWriteLock and ReentrantReadWriteLock that makes implementing this strategy easy.

In theory, this should allow more reads to happens simultaneously because they aren’t being constrained to being sequential. In my tests, there was a very marginal difference, probably in part because my tests were based on a fresh cache database where there was always a write after a read.

Conclusion

Using SQLite as a cache database has proven pretty useful so far, though the database getting locked is a pain point. Simply having the database open with an SQL IDE like DB Browser for SQLite causes SQLITE_BUSY errors. Since this app will be running on a server without anything else accessing the cache database, this isn’t a problem.

There is a possibility that it is possible to configure the SQLite connection to properly handle multiple read/write queries on the same connection without implementing a mutex/lock. I just didn’t have the time/patience this go to try that.

For more info on threading with SQLite, there is some info here.