News/Changes
2009-12-23
Update on the development progress
There has been quite some interesting development progress lately which I want to share. Beside that there was again some stupid IMDb data I corrected. First to the enhancements and new features:
- The SQL statement tab in the search windows has been enhanced a little bit but it's still not complete. Currently I have a strange problem with the separator line between the SQL command entry field and the field displaying error messages. It shouldn't be hard to fix but I don't see the problem...currently the position of the separator line can't be changed from the code (save/restore).
-
After having a PostgreSQL training last week I've played around with the
FULL TEXT indexing and search feature that PostgreSQL supports since version
8.3. The speed improvement searching for words searches (default JMDB behaviour) will
be much faster using the text index and language specific search option.
What does it mean in numbers you might ask...well here we go:- Actor (name) search: 2800-times faster (5.6 sec down to 2.5 msec)
- Movie (title) search: 1000-times faster (15.0 sec down to 15.3 msec)
- Because of the speed improvement above I was also thinking about some 'word completion' feature in the search window. It's the same stuff you see in the URI field of the webbrowser when you type in a known/already used website. As the full text indexes use some kind of ranking to index words this could also be used to show possible words the user enters and which he could select. Because this needs an additional database connection and has to run in an additional search thread this will be done after the connection pooling enhancement is ready to be used (see below).
- I'm currently also going to check the timeframe search as I suspect there is also room for some speedup improvements.
- Finally I began to integrate the H2 database into JMDB which is working beside some minor things that need to be addressed. This will be completed and accessible together with the connection pooling core.
As I said above I stumbled again over some stupid IMDb entries. The most annoying entry could be found in the keywords where one was greater 100 characters in lenght. I fixed it already online in the database but it will be January when the list file is updated. There was also one very long entry in the color-info file but that did make sense. I'm going to change the import and table structure. Currently the color info tables contains an id and colorinfo column but there is also additional information stored in the file which is currently also stored in the colorinfo column. The next version will store this additional information in the new column additional as it's also done for some other data files.
Ok that's it for the moment. There will be some update shortly that's not only for the beta-testers. ;)
2009-11-22
General update
There hasn't been much progress lately but moving/renovating is now complete. I also had some trouble with a water leak coming from the appartment above mine. The result wasn't looking very nice I can tell you. Whatever, I'm again working on JMDB and also some webOS (Palm Pre/Pixi) applications right now. I hope I'm still able to release a new JMDB version this year which would be nice for the 10-years anniversary of the Java Movie Database (Note: The public GUI application release was in 2001 but development started in late 1999 with some commandline application).
2009-09-27
Update on the JMDB development progress
I've been again busy at work and also with some renovating which slowed me down on the JMDB development. Still I was able to make a few enhancements:
-
I've updated the Movie Collection feature so it now also
works with PostgreSQL (which is 10x faster compared to MySQL
looking up additional movie information).
- It's now fully using JMDBs database abstraction layer.
- The samples have been updated.
- Using a flag in the *.def file it's now possible to turn of creating the movie temp table.
- Using a flag in the *.def file it's now possible to define the codepage to be used reading the *.data file. Previously the movie titles could break when you created the *.data file by exporting data from an Excel or Open Office Calc sheet. I think I might include a sample data file on the next release.
- I received an Italian language file which will be included in the upcoming release (many thanks to Vittorio Monaco). Maybe I'll also add a download link so it can be used with JMDB 1.36 as long as the new release is not available.
- I also added a user extension to import additional data. This has been added on request of a PHD student who needs to access the movielens data. That data (u.item file) is pretty outdated when it comes to the naming of the titles. I had to implement some lookup functionality to find the correct movies in the current database. This is not useful for the regular JMDB user, but I want to let you know that I also work on those addon.
- I'm also working on a window (actually a tab in the search window) that allows the user to enter his own SQL-query. That's not yet working but on it's way [Hi Jon! :)].
There have also been two requests for the IMDb movie ID file that is updated by a JMDB user an uploaded to me from time to time (latest file is from April 2009; I don't know how far the progress of IMDbs the article swapping has been in April!).
IMDb also had again incorrect and incomplete data in their list files. The number of errors JMDB writes into the IMDB error file is not as big as before but there are still some missing links to movies. There have been also some release-dates with no date but a '?' as date. I ask you: Don't you think it should be required to give an exact date for the 'release-dates'? If I don't know the date it's worth nothing to tell every user that something has been released in country but I have no clue when it was. I really think IMDb should stop adding stupid information like this.
2009-08-22
Next JMDB releases (development progress)
Sorry I didn't update the website within the last 3 month. I've been busy on my new job and right now I'm also moving, still JMDB has been updated in the meantime.
Most time went into updating the import code where a user pointed me into the right direction (JDBC Batch Inserts). First I've updated only some methods so Batch Inserts where used and the speed improvement were good for PostgreSQL. MySQL at first wasn't any faster but that changed after I updated to an still unreleased Connector/J JDBC driver plus using the URL-Parameter "rewriteBatchedStatements=true". I learned that from a post of the Connector/J developer Mark Matthews on his blog (Mark Matthews: A 10x Performance Increase for Batch INSERTs With MySQL Connector/J Is On The Way....). With this parameter and the new daily build of the MySQL JDBC driver MySQL could again compete with PostgreSQL which was taking a lead performance wise. PostgreSQL still was a little bit faster anyway but that was only because of the faster index creation especially on the movies2actors table (round about 11 million entries for each of the two indexes). Round about two weeks ago the external InnoDB plugin v1.0.4 has been released for MySQL v5.1.37 (InnoDB Plugin Download) and I tested it as well. The blog entry of the InnoDB Plugin v1.0.4 announcement lists a few enhancements over the build-in InnoDB engine of MySQL. With the external InnoDB plugin now the index creation is a lot faster, so I can recommend this plugin if you want to speed up MySQL (I'll attach a PDF file later with some performance charts for PostgreSQL and MySQL). The plugin also offers file compression but I haven't tested that yet.
Well you might want to know how much faster the current development of JMDB really is? Well one of the beta tester (he's using MySQL) was able to import all IMDb list files in 29 min and 45 sec while it took 2h 55min and 53 sec with the previous version (comparable with JMDB v1.36). That's round about 5.9 times faster. I'm cleaning up the current development version within the next weeks and release this to the public. This way all of you don't have to wait any longer to the next release - I think the speed improvement achieved is worth it.
The IMDb data is still pretty messed up (after the import of the IMDb list files just look into the IMDb_Error.log JMDB creates). I'm going to hunt down someone at IMDb to get those problems solved. Maybe I should try to get in contact with Cole Needham directly.
2009-05-16
They finally did it...one of the problematic IMDb data errors is fixed
The IMDb error in the AKA-Names file is finally fixed (IMDb data from 2009-05-15) so you don't have to fix the file manually and JMDB v1.36 doesn't crash any more while importing the IMDb files. As the german-aka file isn't uploaded to the FTP-Server since last month or so (hasn't been updated for years), you need to disable it in the JMDB setup dialog as JMDB is importing the file by default. That's all for now. Once I have more to share I'll let you know.
2009-05-09
Here again some updates on "Ups, they did it again..." (see 2009-04-06) and other stuff
As the IMDb errors I wrote here haven't been fixed I used the normal IMDb Helpdesk to report the error on the AKA-Names last weekend. This time a got a reply I'm going to share:
Re: AKA Name "Vanessa" includes a TAB char
Thanks for reporting the problem to us.
We are aware of this technical issue and our staff is looking into it.
We hope to have it fixed as soon as possible.
Sorry for any inconvenience caused and thanks once again for bringing this to our attention!
----
Regards,
[Name-Removed]
The IMDb Help Desk
I still have my doubts but we'll see if and when this changes.
Other updates on JMDB
- Blu-ray Releasedates: I asked Hi-Def Digest if I could use their Releasedates list and I got a reply that I'm allowed to integrate the releasedates into JMDB, but I need to link back to them. So far I didn't start coding but as there may be some interest I wanted to let you know.
- Notifications: For some time I'm looking into the available notifications frameworks available. The Mac OS X user may know Growl but there are other platforms as well. I just found out that there is now also Growl for Windows and I'm going to support that in one of the next closed beta releases. What Notification framework for Linux I should use? It seems Mumbles is for Gnome. Then there is Specto. So far I didn't take a closer look and maybe someone can point me to something else (see the Contact note on the left) I should support. All I need is a way to call it using Java.
- Database abstraction: There are some updates on the database abstraction I've implemented. For the connection pooling I'm going to change the current development. I just implemented something at work that seems to be useful for JMDB. Sorry I can't tell you more right now.
I did a quick hack to integrate Growl for Windows. Below you see the result using the standard notifiction box and then again using the plain box (there are more). As I didn't send an icon together with the notification message it's using the default icon.

Screenshot of the "standard" Growl for Windows notification style (JMDB development version 2009-05-09)

Screenshot of the "plain" Growl for Windows notification style (JMDB development version 2009-05-09)
2009-04-25
Update on "Ups, they did it again..." (see 2009-04-06)
Sad to say but the reported errors still haven't been fixed. Still there are fewer errors in the release-dates.list as only one entry at the top has no title but a release-date. The other errors are still there. Let's see if wonders happen and the problems are fixed next week. Normally my updates should have been processed if I look at the IMDB processing times page.
I already see it coming that I need to release my internal development version of JMDB v1.40 to the public so the broken IMDb data can be used without correcting the files manually. I haven't touched the current development version for the last couple of days as I have to work on something else right now. End of next week I should be able to resume the work again.
2009-04-17
JMDB v1.40 progress and Update on "Ups, they did it again..." (see 2009-04-06)
There has been again much progress on JMDB development within the last two weeks. This includes support of two more list files. The JMDB v1.40 development version has been sent to the test users.
- Added support for the alternate-versions.list
- Added support for the Crazy-Credits.list
Ups, they did it again... Almost two weeks after I fixed the error online using the correction form, the AKA-Names list still contains this error. It's even better! One of the beta testers pointed out that the AKA-Names file from last week contains another error where a TAB character is involved. The good thing is that this doesn't break the import code as the other one does.
For JMDB v1.36 you still need to correct the error manually as it has been written below (News entry from 2009-04-06) or you can disable the import of the AKA-Names file. JMDB v1.40 contains a workaround for the errors but I can't release it yet to the public and I also can't apply the fix to the current release version. I really hope next week my fix of the data finally makes it into the list file export.
Other errors in the release-dates.list reported round about four weeks ago
haven't also been fixed yet. As I wrote earlier these are worked around
in JMDB v1.40. Generally all kind of IMDb data errors are written to the IMDb_Error.log
that JMDB creates while importing new data.
Currently this file is 6.5 MB big (with IMDb list files from 2009-04-10)
if you select all files to import. Round about 80% of the errors
are from the locations.list which which will be supported in JMDB v1.40 and up.
Most of the other errors (a little bit over 15%) come from the
german-aka-titles/italian-aka-titles because those files are unsupported
by IMDb for a few years now.
If IMDb changes the title of a movie a little bit (movies.list) the
outdated files are not updated. The result is that the title found in
those aka-titles files are not found in the movies.list and each
entry not found shows up in the IMDb_Error.log.
2009-04-06
Ups, they did it again...IMDb released a broken AKA-Names list file which is crashing the JMDB import process
IMDb released a broken AKA-Names list file last weekend that is crashing
the JMDB import. I already fixed the IMDb data using a web form but it
will take the IMDb list keepers at least to next Saturday to fix the
problem (maybe longer). A TAB has been added in the data where it doesn't
belong and this is not the first time this happens. IMDb really needs
more checks when data is stored in the database.
If you want to use the aka-names.list.gz released on 2009-04-03 you need
to extract the file and fix the broken entry using a text editor.
(aka Vanessa)
(aka Kennedy, Mrs. V.)
(aka Videl, Vanessa)
The first aka entry contains a TAB between aka and Vanessa that has to be removed. After you fixed this issue save the file and start the JMDB file import. Now it doesn't crash while importing the AKA-Names.
The release-dates.list.gz also contains several errors (see news from 2009-03-29) which I worked around for the next major JMDB release v1.40 (still in early internal testing stage). The errors have been reported to IMDb round about two weeks ago but so far they haven't been fixed.
While I'm talking about JMDB v1.40, I can tell you that I enhanced the locations list a little bit. I created hyperlinks that open the webbrowser with the location details using Google Maps. Most of time you'll get a map showing the location used to film the movie. Google Maps has some problems if the location contains a name of a building and doesn't find the location but in general it's working quite nicely.
2009-03-29
First Alpha of JMDB v1.40 sent to external testers!
I've been working hard on the next JMDB version. It doesn't have a release date as I'm going to get a new job shortly and I can't plan anything right now.
So far the feedback on the new version was very good as no problems have been found. There are still some things in the works but here are some functions already working:
- Added support for the literature.list
- Added support for the locations.list
- Updated checks on the release-dates.list (because IMDb had recently again some false entries)
- Updated business.list import and output to display the two letter IMDb codes.
The codes help you to see where the money has been earned from (e.g. "RT:" meaning "Rentals"). - Modified query behaviour to support newer JDBC drivers for MySQL and PostgreSQL.
The new PostgreSQL driver (tested only 8.3-604 so far) was twice as fast as the old one for the movies.list import (5:20 vs 10:20). On the other files it's only a slight speedup.
For the MySQL driver the encoding parameter at the DB-URL has to be removed. I couldn't really see a speedup so far. - Added support for HSQLDB. So far this is not completed and HSQLDB is really struggling with the amount of data (HSQLDB has a 8 GB file limit for the database which JMDB does almost hit). I also had some problems with exceptions in the standalone running HSQLDB v1.8.0.10. So far I didn't try the 1.9.0 alpha.
- The database setup dialog also includes SQLite, IBM DB2, Oracle and MS SQL-Server, but those are not yet fully supported. SQLite (almost ready) and Oracle will be completed next.
Some other things which should improve the import speed are also in the works. The query speed should be increased by using multiple connections (connection pooling).
2009-02-18
The webserver hosting the JMDB packages will be moved shortly!
The provider hosting my personal website (including other projects) is moving the complete server farm to the new computing center. In the night from 18th February to 19th February 2009 (CET) my server will be affected by this task.
As the JMDB packages are stored on this personal website there will be problems downloading the latest JMDB releases until they have completed that task. Please excuse this inconvenience. Many thanks in advance!
2009-02-17
JMDB v1.36 refresh is available at the download section!
The refreshed version of the JMDB v1.36 is now available.
The only things changed between the first release and this one are two
more language files for Chinese mainland and Taiwan.
You don't have to update if you don't need the Chinese language files.
2009-02-06
JMDB v1.36 is available at the download section!
The new JMDB v1.36 is now available. You should upgrade as soon as possible. As before there are installer packages for eComStation (OS/2), Windows and a simple ZIP archive for most other operating systems available. JMDB can be used on every platform where a Java runtime (Java >=1.2) is available. Some smaller enhancements are also available for Mac OS X users.
2009-02-05
JMDB v1.36 will be available shortly!
There has been a problem with the aka-titles.list(.gz) for some time now
that the upcoming JMDB release v1.36 will fix. The startup scripts
will also be updated. I've already created the JAR file of the new version
but I have to update the documentation and create the installation packages
for each of the operating systems.
I also still work on JMDB v1.40 which will follow next. That release will
include the database changes I already talked about.
As you might have noticed I also updated the IMDb data development picture at the entry page. The update of the startup scripts for JMDB v1.36 is also because the limit for the import with 300 MB memory has been reached again. If you haven't updated the -Xmx parameter yet you should have seen the OutOfMemoryException warning. I got it when I tried to import the IMDb data from 2009-01-30 with Java 1.6.0_11-b03 on Vista (32Bit) and JMDB was processing the biographies.list(.gz) file.
Finally the news section has been extended to reflect we now have 2009. I think this will be a great year for JMDB. There might even be a JMDB-Mobile version with a subset of IMDb data. More on that when I released JMDB v1.40.