MapDB的使用实战[基于Java的数据库] http://blog.csdn.net/qy20115549/article/details/53207093
用16G内存在Java Map中处理30亿对象 http://www.coderli.com/translate-java-collections-bigdata-mapdb/
MapDB的spring整合使用 http://cywhoyi.iteye.com/blog/2264396
MapDB是一个快速、易用的嵌入式Java数据库引擎,它提供了基于磁盘或者堆外(off-heap允许Java直接操作内存空间, 类似于C的malloc和free)存储的并发的Maps、Sets、Queues。
业务场景:
朋友公司需要根据坐标,在200m的地址库中寻找离该坐标最近的经纬度坐标,难点主要有以下两个:
1.快速把坐标落点到二维的平面上区域,假设(-1,-1),应该落点到xy二维的左下方,这里我采用KDTree的方式
2.因为考虑到tree构建成功后,不想每次都重新构建树,那就需要把树缓存起来,但是通过redis等分布式的cache觉得网络带宽是瓶颈,而且我们的地址库可能会频繁更新,如果用jvm等map的缓存,内存马上就被爆仓了,后来转用MapDB发现它提供多种缓存方式,而且对比后,不管速率以及占用空间都相对较小
3.计算点点之间的距离,在二维平面上其实并不难,通过向量,计算sin、cos等常用手段,马上计算所得结果
Spring中但配置
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
<bean id="dbFile" class="java.io.File"> <constructor-arg value="/usr/local/DB/monitor.DB"></constructor-arg> </bean> <bean id="dbFactory" class="org.mapdb.DBMaker" factory-method="newFileDB"> <constructor-arg ref="dbFile" /> </bean> <bean id="shutdownHook" factory-bean="dbFactory" factory-method="closeOnJvmShutdown"> </bean> <bean id="database" factory-bean="dbFactory" factory-method="make"> </bean> |
Spring应用启动时加载
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
public class StartupListener implements ServletContextListener { private static final Logger LOG = LoggerFactory.getLogger(StartupListener.class); @Override public void contextInitialized(ServletContextEvent e) { ApplicationContext ctx = WebApplicationContextUtils.getWebApplicationContext(e.getServletContext()); // AddressInfoMapper addressInfoMapper = (AddressInfoMapper)ctx.getBean("addressInfoMapper"); DB db = (DB) ctx.getBean("database"); BTreeMap<String, String> monitorDataMap = db.getTreeMap("monitorDataMap"); // monitorDataMap.put("name", "Young"); //you can load address information to mapdb db.commit(); if (ctx == null) { LOG.error("app start fail!", e); throw new RuntimeException("WebApplicationContextUtils.getWebApplicationContext() Fail!"); } LOG.info("app start success."); } @Override public void contextDestroyed(ServletContextEvent sce) { } } |
Service中使用
1 2 3 4 5 6 7 8 9 10 11 12 |
// Injected database the map are obtained from it. private DB database; private BTreeMap<String, String> monitorDataMap; public void setDatabase(DB database) { this.database = database; } @PostConstruct public void init() throws Exception { this.monitorDataMap = database.getTreeMap("monitorDataMap"); } |
KDTree构建
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 |
public class KDTree { // prevent instantiation private KDTree() {} private KDTreeNode root; public static KDTree build(List<? extends Point> points) { KDTree tree = new KDTree(); tree.root = build(points, 0); return tree; } private static KDTreeNode build(List<? extends Point> points, int depth) { if (points.isEmpty()) return null; final int axis = depth % 2; Collections.sort(points, new Comparator<Point>() { public int compare(Point p1, Point p2) { double coord1 = p1.getCoords()[axis]; double coord2 = p2.getCoords()[axis]; return Double.compare(coord1, coord2); } }); int index = points.size() / 2; KDTreeNode leftChild = build(points.subList(0, index), depth + 1); KDTreeNode rightChild = build(points.subList(index + 1, points.size()), depth + 1); Point point = points.get(index); return new KDTreeNode(point, axis, leftChild, rightChild); } @SuppressWarnings({"unchecked"}) public <T extends Point> T findNearest(Point point) { return (T) findNearest(point, 1).get(0); } public List<? extends Point> findNearest(Point point, int amount) { return root.findNearest(point, amount); } @SuppressWarnings({"unchecked"}) public <T extends Point> T getRootPoint() { return (T) root.getPoint(); } } |
个人结论:
在使用mapdb的使用后,本人并未去深入了解mapdb的底层原理,只是应急使用,后续肯定会有很多bug显现,但是在使用其框架后,确实性能不少,3-5ms内就能够很容易的找到点之间最近关联的,内存损耗40多m左右。