janusGraph集群搭建
- 前言
-
- 软件下载
- 配置步骤
-
- hadoop安装
- zookeeper安装
- hbase安装
- elasticsearch安装
- janusGraph安装
- 测试安装结果
- 问题汇总
前言
janusGraph的资料实在又少又杂,近日搭建了一个集群,写下踩坑记录。由于janusGraph可以自由配置后端存储和索引。之前先写了一个非root用户的hadoop搭建,本文在那篇文章的基础上继续搭建。
涉及部分服务器,节点描述都和hadoop搭建的相同、
非root用户的hadoop搭建。
软件下载
本次搭建的版本为hadoop2.10.1+zookeeper3.5.8+hbase2.2.6+elastisearch6.6.0+janusGraph0.5.2
下文都提供了官方下载地址,为了方便读者,可从如下链接一键免费获取。janusGraph在linux服务器的分布式配置安装全家桶
1. hadoop
由于使用分布式储存,首先需要搭建hadoop,参考上个博客下载安装。软件地址hadoop官方
2. zookeeper
ZooKeeper是一个分布式的,开放源码的分布式应用程序协调服务,是Google的Chubby一个开源的实现,是Hadoop和Hbase的重要组件。它是一个为分布式应用提供一致性服务的软件,提供的功能包括:配置维护、域名服务、分布式同步、组服务等。zookeeper下载官方地址。
本次搭建搭建三个zookeeper。
3. hbase
HBase是一个分布式的、面向列的开源数据库,该技术来源于 Fay Chang 所撰写的Google论文“Bigtable:一个结构化数据的分布式存储系统”。就像Bigtable利用了Google文件系统(File System)所提供的分布式数据存储一样,HBase在Hadoop之上提供了类似于Bigtable的能力。HBase是Apache的Hadoop项目的子项目。HBase不同于一般的关系数据库,它是一个适合于非结构化数据存储的数据库。另一个不同的是HBase基于列的而不是基于行的模式。
hbase下载官网
4. elasticsearch
Elasticsearch是一个基于Lucene的搜索服务器。它提供了一个分布式多用户能力的全文搜索引擎,基于RESTful web接口。Elasticsearch是用Java语言开发的,并作为Apache许可条款下的开放源码发布,是一种流行的企业级搜索引擎。Elasticsearch用于云计算中,能够达到实时搜索,稳定,可靠,快速,安装使用方便。官方客户端在Java、.NET(C#)、PHP、Python、Apache Groovy、Ruby和许多其他语言中都是可用的。根据DB-Engines的排名显示,Elasticsearch是最受欢迎的企业搜索引擎,其次是Apache Solr,也是基于Lucene。
elasticsearch官网下载
5. janusGraph
有关janusGraph的介绍可以参考我之前写的在windows10安装的教程。janusGraph在window10配置教程
janusGraph官网下载
配置步骤
同hadoop安装相同,没有特别申明的情况下在master64服务器操作。
hadoop安装
安装参考上篇博客非root用户的hadoop搭建。
zookeeper安装
在主服务器上解压并改名
tar -zxvf apache-zookeeper-3.5.8-bin.tar.gz
mv apache-zookeeper-3.5.8-bin.tar.gz zookeeper
进入conf目录进行配置
cd zookeeper/conf
mv zoo_sample.cfg zoo.cfg
vim zoo.cfg
以下是zoo.cfg内容,特别注意,dataDir的目录为自定义,最后三行为集群的ip地址,node1,node2,node3已经写入系统hosts文件,否则需要写成ip。
1 # The number of milliseconds of each tick2 tickTime=20003 # The number of ticks that the initial 4 # synchronization phase can take5 initLimit=106 # The number of ticks that can pass between 7 # sending a request and getting an acknowledgement8 syncLimit=59 # the directory where the snapshot is stored.10 # do not use /tmp for storage, /tmp here is just 11 # example sakes.12 dataDir=/home/hadoop/zookeeper/data13 # the port at which the clients will connect14 clientPort=218115 # the maximum number of client connections.16 # increase this if you need to handle more clients17 #maxClientCnxns=6018 #19 # Be sure to read the maintenance section of the 20 # administrator guide before turning on autopurge.21 #22 # http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance23 #24 # The number of snapshots to retain in dataDir25 #autopurge.snapRetainCount=326 # Purge task interval in hours27 # Set to "0" to disable auto purge feature28 #autopurge.purgeInterval=129 server.1=node1:2888:388830 server.2=node2:2888:388831 server.3=node3:2888:3888
~
在dataDir编辑的保存目录下创建myid文件,并在64服务器编辑内容为1,178服务器为2,179服务器为2(对服务器名字有困惑的移步hadoop搭建)
cd ../data
vim myid
至此已经完成了配置64服务器的配置,将本次配置分发道178,179服务器。注意路径要更改
tar zxcf zookeeper.master.tar.gz zookeeper
scp zookeeper.master.tar.gz node2:/home/hadoop
scp zookeeper.master.tar.gz node3:/home/hadoop
分发到两个服务器后解压。需要修改两个的myid文件。其实这个id对应的就是zoo.cfg的server.1,2,3。
启动zookeeper,可以首先进入~/.bashrc添加环境变量。下图可以参考。
让环境变量生效。
source ~/.bashrc
下面是几个指令。打开,关闭zookeeper。查询状态
zkServer.sh start
zkServer.sh stop
zkServer.sh status
本次首先打开并查看状态,三个服务器分别打开后,随机有个leader,两个follower
hbase安装
解压,改名并进入conf目录
tar zxvf hbase-2.2.6-bin.tar.gz
mv hbase-2.2.6-bin.tar.gz hbase
cd hbase/conf
vim hbase-env.sh
增加三行,java路径要根据具体情况写,HBASE_CLASSPATH的路径是hadoop的etc配置文件路径。最后ZK需要改成false,不使用hbase自带的。
141 export JAVA_HOME=/usr/lib/jvm/jdk1.8.0_162
142 export HBASE_CLASSPATH=/diskC/hadoop/hadoop/etc/hadoop/
143 export HBASE_MANAGES_ZK=false
之后进入hbase-site.xml修改,由于是集群分布,所以hbase.cluster.distributed为true,rootdir和hadoop安装的core-site.xml相同,并添加hbase。hbase.zookeeper.quorum为三个节点的ip。
42 <property>43 <name>hbase.cluster.distributed</name>44 <value>true</value>45 </property>46 <property>47 <name>hbase.tmp.dir</name>48 <value>/diskC/hadoop/hbase/tmp</value>49 </property>50 <property>51 <name>hbase.unsafe.stream.capability.enforce</name>52 <value>false</value>53 </property>54 <property>55 <name>hbase.rootdir</name>56 <value>hdfs://node1:8020/hbase</value>57 </property>58 <property>59 <name>hbase.zookeeper.quorum</name>60 <value>node1,node2,node3</value>61 </property>62 <property>63 <name>hbase.zookeeper.property.dataDir</name>64 <value>/diskC/hadoop/zookeeper/data</value>65 <description>Property fromZooKeeper's config zoo.cfg.66 The directory where the snapshot isstored.67 </description>68 </property>69 <property>70 <name>hbase.zookeeper.property.clientPort</name>71 <value>2181</value>72 <description>Property fromZooKeeper's config zoo.cfg.73 The port at which the clients willconnect.74 </description>75 </property>76 </configuration>
之后修改regionservers。三个节点的ip。
vim regionservers
增加环境变量。
131 #======hbase===
132 export HBASE_HOME=/home/hadoop/hbase/
133 export PATH=$PATH:$HBASE_HOME/bin:$HBASE_HOME/conf
同样将压缩并分发到另外两个服务器,注意hbase-env.sh和hbase-site.xml中的相应位置会存在不同。 需要特别留意。
开启hbase指令为,这个指令只需要在64主节点服务器就可以。
start-hbase.sh
关闭指令为
stop-hbase.sh
elasticsearch安装
解压并改名,只在64服务器
tar zxvf elasticsearch-6.6.0.tar.gz
mv elasticsearch-6.6.0.tar.gz elasticsearch
进入~/.bashrc添加环境变量(可不做)开启。
bin/elasticsearch
我自己在开启时使用了nohup在后台开启
nohup bin/elasticsearch >logs/es.log 2>&1&
janusGraph安装
解压并改名,只在64服务器。
让janusGraph使用server和client模式。进入conf目录修改。
cd janusGraph/conf
vim janusgraph-hbase-es.properties
强调修改的几个,一个是storage.hostname为三个节点的ip,index.search.hostname也为三个节点的ip
1 # Copyright 2019 JanusGraph Authors2 #3 # Licensed under the Apache License, Version 2.0 (the "License");4 # you may not use this file except in compliance with the License.5 # You may obtain a copy of the License at6 #7 # http://www.apache.org/licenses/LICENSE-2.08 #9 # Unless required by applicable law or agreed to in writing, software10 # distributed under the License is distributed on an "AS IS" BASIS,11 # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.12 # See the License for the specific language governing permissions and13 # limitations under the License.14 15 # JanusGraph configuration sample: HBase and Elasticsearch16 #17 # This file connects to HBase using a Zookeeper quorum18 # (storage.hostname) consisting solely of localhost. It also connects19 # to Elasticsearch running on localhost over Elasticsearch's native "Transport"20 # protocol. Zookeeper, the HBase services, and Elasticsearch must already 21 # be running and available before starting JanusGraph with this file.22 23 # The implementation of graph factory that will be used by gremlin server24 #25 # Default: org.janusgraph.core.JanusGraphFactory26 # Data Type: String27 # Mutability: LOCAL28 gremlin.graph=org.janusgraph.core.JanusGraphFactory29 30 # The primary persistence provider used by JanusGraph. This is required. 31 # It should be set one of JanusGraph's built-in shorthand names for its32 # standard storage backends (shorthands: berkeleyje, cassandrathrift,33 # cassandra, astyanax, embeddedcassandra, cql, hbase, inmemory) or to the34 # full package and classname of a custom/third-party StoreManager35 # implementation.36 #37 # Default: (no default value)38 # Data Type: String39 # Mutability: LOCAL40 storage.backend=hbase41 42 # The hostname or comma-separated list of hostnames of storage backend43 # servers. This is only applicable to some storage backends, such as44 # cassandra and hbase.45 #46 # Default: 127.0.0.147 # Data Type: class java.lang.String[]48 # Mutability: LOCAL49 #storage.hostname=127.0.0.150 51 storage.hostname= XXX.XXX.XXX.64, XXX.XXX.XXX.178, XXX.XXX.XXX.17952 # Whether to enable JanusGraph's database-level cache, which is shared53 # across all transactions. Enabling this option speeds up traversals by54 # holding hot graph elements in memory, but also increases the likelihood55 # of reading stale data. Disabling it forces each transaction to56 # independently fetch graph elements from storage before reading/writing57 # them.58 #59 # Default: false60 # Data Type: Boolean61 # Mutability: MASKABLE62 cache.db-cache = true63 64 # How long, in milliseconds, database-level cache will keep entries after65 # flushing them. This option is only useful on distributed storage66 # backends that are capable of acknowledging writes without necessarily67 # making them immediately visible.68 #69 # Default: 5070 # Data Type: Integer71 # Mutability: GLOBAL_OFFLINE72 #73 # Settings with mutability GLOBAL_OFFLINE are centrally managed in74 # JanusGraph's storage backend. After starting the database for the first75 # time, this file's copy of this setting is ignored. Use JanusGraph's76 # Management System to read or modify this value after bootstrapping.77 cache.db-cache-clean-wait = 2078 79 # Default expiration time, in milliseconds, for entries in the80 # database-level cache. Entries are evicted when they reach this age even81 # if the cache has room to spare. Set to 0 to disable expiration (cache82 # entries live forever or until memory pressure triggers eviction when set83 # to 0).84 #85 # Default: 1000086 # Data Type: Long87 # Mutability: GLOBAL_OFFLINE88 #89 # Settings with mutability GLOBAL_OFFLINE are centrally managed in90 # JanusGraph's storage backend. After starting the database for the first91 # time, this file's copy of this setting is ignored. Use JanusGraph's92 # Management System to read or modify this value after bootstrapping.93 cache.db-cache-time = 18000094 95 # Size of JanusGraph's database level cache. Values between 0 and 1 are96 # interpreted as a percentage of VM heap, while larger values are97 # interpreted as an absolute size in bytes.98 #99 # Default: 0.3
100 # Data Type: Double
101 # Mutability: MASKABLE
102 cache.db-cache-size = 0.5
103
104 # The indexing backend used to extend and optimize JanusGraph's query
105 # functionality. This setting is optional. JanusGraph can use multiple
106 # heterogeneous index backends. Hence, this option can appear more than
107 # once, so long as the user-defined name between "index" and "backend" is
108 # unique among appearances.Similar to the storage backend, this should be
109 # set to one of JanusGraph's built-in shorthand names for its standard
110 # index backends (shorthands: lucene, elasticsearch, es, solr) or to the
111 # full package and classname of a custom/third-party IndexProvider
112 # implementation.
113 #
114 # Default: elasticsearch
115 # Data Type: String
116 # Mutability: GLOBAL_OFFLINE
117 #
118 # Settings with mutability GLOBAL_OFFLINE are centrally managed in
119 # JanusGraph's storage backend. After starting the database for the first
120 # time, this file's copy of this setting is ignored. Use JanusGraph's
121 # Management System to read or modify this value after bootstrapping.
122 index.search.backend=elasticsearch
123
124 # The hostname or comma-separated list of hostnames of index backend
125 # servers. This is only applicable to some index backends, such as
126 # elasticsearch and solr.
127 #
128 # Default: 127.0.0.1
129 # Data Type: class java.lang.String[]
130 # Mutability: MASKABLE
131 index.search.hostname= XXX.XXX.XXX.64, XXX.XXX.XXX.178, XXX.XXX.XXX.179
进入gremlin-server修改gremlin-server.yaml。主要是修改了graph的配置。
16 host: 0.0.0.017 port: 818218 scriptEvaluationTimeout: 3000019 channelizer: org.apache.tinkerpop.gremlin.server.channel.WsAndHttpChannelizer20 #channelizer: org.apache.tinkerpop.gremlin.server.channel.WebSocketChannelizer21 graphs: {
22 # graph: conf/janusgraph-hbase-es.properties23 graph: conf/janusgraph-hbase-es.properties24 }25 scriptEngines: {
26 gremlin-groovy: {
27 plugins: {
org.janusgraph.graphdb.tinkerpop.plugin.JanusGraphGremlinPlugin: {
},28 org.apache.tinkerpop.gremlin.server.jsr223.GremlinServerGremlinPlugin: {
},29 org.apache.tinkerpop.gremlin.tinkergraph.jsr223.TinkerGraphGremlinPlugin: {
},30 org.apache.tinkerpop.gremlin.jsr223.ImportGremlinPlugin: {
classImports: [java.lang.Math], methodImports: [java.lang.Math#*]},31 org.apache.tinkerpop.gremlin.jsr223.ScriptFileGremlinPlugin: {
files: [scripts/empty-sample.groovy]}}}}32 serializers:33 - {
className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV3d0, config: {
ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}34 - {
className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV3d0, config: {
serializeResultToString: true }}35 - {
className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV3d0, config: {
ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}36 # Older serialization versions for backwards compatibility:37 - {
className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV1d0, config: {
ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}38 - {
className: org.apache.tinkerpop.gremlin.driver.ser.GryoLiteMessageSerializerV1d0, config: {
ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}39 - {
className: org.apache.tinkerpop.gremlin.driver.ser.GryoMessageSerializerV1d0, config: {
serializeResultToString: true }}40 - {
className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerGremlinV2d0, config: {
ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistry] }}41 - {
className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerGremlinV1d0, config: {
ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistryV1d0] }}42 - {
className: org.apache.tinkerpop.gremlin.driver.ser.GraphSONMessageSerializerV1d0, config: {
ioRegistries: [org.janusgraph.graphdb.tinkerpop.JanusGraphIoRegistryV1d0] }}43 processors:44 - {
className: org.apache.tinkerpop.gremlin.server.op.session.SessionOpProcessor, config: {
sessionTimeout: 28800000 }}45 - {
className: org.apache.tinkerpop.gremlin.server.op.traversal.TraversalOpProcessor, config: {
cacheExpirationTime: 600000, cacheMaxSize: 1000 }}46 metrics: {
47 consoleReporter: {
enabled: true, interval: 180000},48 csvReporter: {
enabled: true, interval: 180000, fileName: /tmp/gremlin-server-metrics.csv},49 jmxReporter: {
enabled: true},50 slf4jReporter: {
enabled: true, interval: 180000},51 gangliaReporter: {
enabled: false, interval: 180000, addressingMode: MULTICAST},52 graphiteReporter: {
enabled: false, interval: 180000}}53 maxInitialLineLength: 409654 maxHeaderSize: 819255 maxChunkSize: 819256 maxContentLength: 65536
修改完配置后,打开远端服务。
bin/gremlin-server.sh ./conf/gremlin-server/gremlin-server.yaml
我的图片都是log图片,一些文件位置有点区别。
简单的测试。
$ bin/gremlin.sh\,,,/(o o)
-----oOOo-(3)-oOOo-----
plugin activated: tinkerpop.server
plugin activated: tinkerpop.hadoop
plugin activated: tinkerpop.utilities
plugin activated: janusgraph.imports
plugin activated: tinkerpop.tinkergraph
gremlin> :remote connect tinkerpop.server conf/remote.yaml
==>Configured localhost/127.0.0.1:8182
gremlin> :remote console
==>All scripts will now be sent to Gremlin Server - [localhost/127.0.0.1:8182] - type ':remote console' to return to local mode
gremlin> graph
==>standardjanusgraph[cql:[127.0.0.1]]
gremlin> g
==>graphtraversalsource[standardjanusgraph[cql:[127.0.0.1]], standard]
gremlin> g.V()
gremlin> user = "Chris"
==>Chris
gremlin> graph.addVertex("name", user)
No such property: user for class: Script21
Type ':help' or ':h' for help.
Display stack trace? [yN]
测试安装结果
64主服务器
178服务器
179服务器(在搭建hbase时这个作为了备用主节点)
使用主服务器ip:8088查看
使用主服务器ip:50050查看
通过主服务器ip:16010查看
问题汇总
- 在过程中其实有很多问题,上面的端口都是使用默认端口,要特别注意端口不被占用,否则就无法开启
- janusGraph其实还有很多配置需要注意。就不一一展开,欢迎在评论区交流讨论。多看官方文档janusGraph官方文档
- 很多小细节没法展开,由于服务器的位置不同,导致出现很多小问题,欢迎交流。
- 写的真的累,接下来会写写janusGraph的应用