New Contributors to HugeCollections

What is the HugeCollection library

The objectives of HugeCollections for the first release are fairly ambitious.
  • Scale to massive collections i.e. sizes much larger than 2 billion without significant heap foot print or GC impact. i.e. using heap-less memory
  • Be faster and more efficient that using plain JavaBeans with an ArrayList or Map and a vector or unordered_map in C++.
  • Support durability (transparently saved and loaded from disk)
  • Support thread safety (with no overhead if not required) and using multiple threads implicitly. i.e. large operations are automatically distributed.
  • Be faster and more efficient than using a database. Transactions will NOT be in this release. 
  • Support in one application what might have to be distributed otherwise.
  • Dynamic code generation as required (no need to pre-generate code in the build)
A prototype has been built which shows these objectives are possible, however to turn this library in to a usable release, will take some help.

Note: I am open to people working on the things they find most interesting, even if this means some areas are more developed than others.

To have a look at the code start with http://vanilla-java.googlecode.com/svn/sandbox/ which you can also checkout.

Myself

Peter Lawrey - I have been working with Java for 12 years and on high performance systems for 15 years. I have worked at Investment Banks, a prop trading firm and Sun Microsystems.

Two new contributors

Rob Austin currently contracting for a large investment bank, working on a low latency pricing and trading platform. Over 10 years of Java experience.

Costantino Cerbo is a certified Java developer (SCJP and SCBCD) and software consultant with more than 6 years experience. He has worked for one of the Italy's largest banks. He is an Italian native speaker, fluent in German and strong in English (TOEFL: 607 points).

More contributors welcome

I am looking for additional contributors to....
- document the high level approach used and its advantages.
- proper documentation and pier review of the design.
- code review of the hand coded collections for list and hash map (These are templates for the auto-generated classes)- tool to assist the conversion of the template into the code generation.
- Test cases for correct functionality.
- Performance and scalability tests.
- Comparison tests for performance and scalability.
- a comparison with the features of similar products like ehCache BigMemory, javolution, trove4j (also find other products worth comparing) and C++.

- Documentation and blog of the comparison.
- Examine thread safety support and tests.
- Auto multi-threading for filter() and visit() methods.
- Examine integration and examples of use in JVM languages like Scala, Groovy, Jython, JRuby and see what support can be given. i.e. are there simple things which can be done to make it simpler/more natural to use.
- Produce a JTable GUI demo with one billion rows.

Later
- Add queue/dispatcher support.
- Add sorted index (a la TreeMap or map in C++) support.
- Add non-unique indexes.
- Simple remote support. RMI/RPC
- Distributed copies of data and partitioning.

Any suggestions welcome.

My email

You can contact me as peter.lawrey on gmail.

Comments

  1. Maybe I could help support CSV import/export, a SQL query engine, JDBC support, and a simple web frontent. I would write a custom table implementation in my H2 database to allow to access such collections. This wouldn't necessarily be fast, but convenient (running ad-hoc queries, import and export).

    ReplyDelete
  2. @Thomas Mueller, That sounds like a good suggestion. CSV import and export would be particularly useful. How easy would implementing the other features be?

    I will add my email address

    ReplyDelete
  3. @Aleksandr Panzin, That does sound useful. esp to support partitioned data.

    ReplyDelete
  4. @Aleksandr Panzin, I have added this as an issue. To support lists and maps.

    ReplyDelete

Post a Comment

Popular posts from this blog

Java is Very Fast, If You Don’t Create Many Objects

System wide unique nanosecond timestamps

Comparing Approaches to Durability in Low Latency Messaging Queues