지난 달 28일 모바일메신저 카카오톡이 4시간동안 먹통이 된 적이 있었습니다. 카카오톡이 등장한 2010년 3월 이후 가장 오랫동안 서비스가 중단된 상황이었습니다.

같은 달 30일 카카오는 “IDC 전력계통 문제로 서비스가 4시간여 동안 중단됐다”며 “이번 장애 원인은 트래픽 과부하로 인한 전력공급에 대한 문제나, 서버군에 장애가 있었던 것은 아니라는 점은 분명히 말씀드린다”고 공식 자료를 배포했습니다.

이번의 서비스장애는 분명 카카오톡의 문제가 아니라 서버의 문제로 귀결되는 것처럼 보입니다. 하지만 사실 카카오톡의 장애는 이번이 처음이 아닙니다. 지금까지 약 4차례의 서버 장애를 경험했습니다.

카카오톡 장애로 인해 반사이익을 얻은 곳은 네이버 라인과 틱톡입니다. 두 서비스는 카카오톡과 같은 성격의 대체재입니다.

카카오톡, 네이버 라인, 틱톡 이 3개의 서비스를 모두 사용해 본 사람들은 카카오톡보다 네이버 라인이나 틱톡의 메시지 전송속도가 빠른 것을 느낄 것입니다.

다 같은 모바일메신저인데 전송속도에 차이가 있는 것은 서버 성능의 차이도 있겠지만, 애플리케이션의 아키텍쳐도 영향을 끼치지요.

이번 포스팅에서는 네이버재팬(네이버 라인은 네이버재팬이 개발)이 라인의 속도를 높이기 위해 어떠한 고민을 했는지 알아보도록 하겠습니다.

(굳이 네이버 라인을 꼽은 이유는 네이버재팬이 메신저 개발업체 중 유일하게 소스코드를 공개했기 때문입니다.)

네이버 라인의 시작부터 설명을 해야할 것 같습니다.

라인은 지난해 6월 일본 시장에 처음 등장했습니다. NHN 이해진 의장이 직접 프로젝트팀을 꾸려 선보인 모바일메신저로, 당초 NHN이 서비스하고 있던 네이버톡과 달리 인트턴트 채팅에만 초점을 맞췄습니다. 이후 네이버재팬은 모바일인터넷전화(m-VoIP)를 추가하며(2011년 10월) 사용자를 지속적으로 확보했습니다.

현재 라인은 전 세계 가입자수 3000만 명을 돌파하며 카카오톡을 추격하고 있는 중입니다.

네이버재팬 엔지니어 블로그(http://tech.naver.jp/blog/?p=1420)에 따르면 라인은 NoSQL DBMS(데이터베이스관리시스템)기반으로 만들어졌습니다.

NoSQL은 관계형 DBMS와 달리 비관계형 DBMS입니다. 때문에 대규모의 데이터를 유연하게 처리할 수 있는 특징이 있습니다.

관계형 DBMS로 모바일메신저나 소셜네트워크서비스(SNS)를 만들 경우, 새로운 업데이트가 있을 때 마다 일관성과 유효성을 체크하기 때문에 병목현상이 생길 가능성이 있습니다.

메신저 서비스에서 병목현상이란 새로운 메시지가 다량으로 송수신될 때, DBMS가 버티지 못한다는 의미입니다. 다량의 메시지를 서버가 감당하지 못한다고 해석할 수 있습니다.

이 때문에 트위터나 페이스북은 일찍부터 NoSQL을 사용하고 있습니다. 새로운 데이터(게시물)가 업데이트될 때 읽고, 쓰는 비율이 5:5가 될지라도 서비스가 유지될 수 있기 때문입니다.


다시 라인으로 돌아가면 당초 네이버재팬에서는 라인의 아키텍쳐로 레디스(Redis)를 사용했습니다. 레디스는 NoSQL 종류 중 하나입니다.

네이버재팬은 동기, 비동기가 자유롭고, 슬레이 복제도 가능하다는 레디스의 장점을 적극 살렸습니다. 그러나 레디스의 단점인 데이터 저장공간의 확장이 힘들다는 것을 간과했지요.

처음 네이버재팬에서는 라인의 사용자가 많아봤자 100만명이 안될 것이라고 예상했다고 합니다. 그러나 반년 만에 500만명의 사용자가 넘어서면서 기존에 쓰던 레디스 클러스터를 확장할 것인지, 아키텍쳐를 뜯어고칠 것인지를 고민하게 됐습니다.

그 과정에서 네이버재팬은 새로운 NoSQL을 사용하기로 결정하고 후보로 HBase, 카산드라, 몽고DB(MongoDB) 중 하나를 선택하기로 합니다. 선택기준은 모바일메신저에서 가장 중요한 세가지, 즉 확장성과 가용성, 비용이었습니다.

네이버재팬은 이 중 하둡 파일 시스템 위에서 빠르게 동작할 수 있는 HBase를 선택해 마이그레이션합니다. 데이터 저장과 가용성부분에서 카산드라가 다른 두가지 NoSQL을 제압했지만, 전체적인 요구사항을 HBase가 만족스러웠기 때문이라고 합니다.

라인은 레디스에서 HBase로 마이그레이션한 이후 더 빨라졌습니다. 클러스터를 공유할 수 있을 뿐더러 읽고 쓰는 것에 대한 균형 조정 기능도 갖추고 있어 한꺼번에 많은 데이터가 들어오더라도 해결할 수 있기 때문입니다.

기존 서비스의 한계를 클라우드와 오픈소스로 해결하고, 이 과정을 공개한 것은 동종업계에도 좋은 영향을 끼칠 것으로 보입니다.


[원문출처 : http://www.ddaily.co.kr/news/news_view.php?uid=90737]






LINE Storage: Storing billions of rows in Sharded-Redis and HBase per Month

by sunsuk7tp on 2012.4.26


Hi, I’m Shunsuke Nakamura (@sunsuk7tp). Just half a year ago, I completed the Computer Science Master’s program in Tokyo Tech and joined to NHN Japan as a member of LINE server team. My ambition is to hack distributed processing and storage systems and develop the next generation’s architecture.

In the LINE server team, I’m in charge of development and operation of the advanced storage system which manages LINE’s message, contacts and groups.

Today, I’ll briefly introduce the LINE storage stack.

LINE Beginning with Redis [2011.6 ~]

In the beginning, we adopted Redis for LINE’s primary storage. LINE is targeted for an instant messenger quickly exchanging messages, and the scale had been assumed to at most total 1 million registered users within 2011. Redis is an in-memory data store and does its intended job well. Redis also enables us to take snapshots periodically on disk and supports sync/asynchronous master-slave replication. We decided that Redis was the best choice despite the scalability and availability issues caused by the in-memory data store. The entire LINE storage system started with just a single Redis cluster constructed from 3 nodes sharded on client-side.

The larger the scale of the service, the more nodes were needed, and client-side sharding prevented us from scaling effectively. The original Redis still doesn’t support server-side sharding. So far, we have achieved a sharded redis cluster to utilize our developed clustering manager. Our sharded redis cluster is coordinated by the cluster manager daemons and ZooKeeper quorum servers.

This manager has the following characteristics:

  • Sharding management by ZooKeeper (Consistent hashing, compatible with other algorithms)
  • Failure detection and auto/manual failover between master and slave
  • Scales out with minimal downtime (< 10 sec)

Currently, several sharded Redis clusters are running with hundreds of servers.

Sharded Redis Cluster

Sharded Redis Cluster and Management tool

Tolerance Unpredictable Scaling [2011.10 ~]

However, the situation has changed greatly since then. Around October 2011, LINE experienced tremendous growth in many parts of the world, and operating costs increased as well.

A major issue of increased costs is to scale Redis Cluster in terms of capability. It’s much more difficult to operate Redis cluster to tolerance the unpredictable scale expansion because it needs more servers than the other persistent storages for the nature of in-memory data store. In order to take advantage of safely availability functionalities such as snapshot and full replication, it is necessary to adequately care of memory usage. Redis VM (Virtual Memory) is to somewhat helpful but can significantly impair performance depending on VM usage.

For the above reasons, we often misjudged the timing to scale out and encountered some outages. It then became critical to migrate to a more highly scalable system with high availability.

Over night, the target of LINE has been changed to scale 10s to 100s of millions of registered users.

This is how we tackled the problem.

Data Scalability

At first, we analyzed the order of magnitude for each database.

(n: # of Users)
(t: Lifetime of LINE System)

  • O(1)
    • Messages in delivery queue
    • Asynchronous jobs in job queue
  • O(n)
    • User Profile
    • Contacts / Groups
      • These data originally increase with O(n^2), but there are limitations on the number of links between users. (= O (n * CONSTANT_SIZE))
  • O(n*t)
    • Messages in Inbox
    • Change-sets of User Profile / Groups / Contacts

Rows stored in LINE storage have increased exponentially. In the near future, we will deal with tens of billions of rows per month.

Data Requirement

Second, we summarized our data requirements for each usage scenario.

  • O(1)
    • Availability
    • Workload: very fast reads and writes
  • O(n)
    • Availability, Scalability
    • Workload: fast random reads
  • O(n*t)
    • Scalability, Massive volume (Billions of small rows per day, but mostly cold data)
    • Workload: fast sequential writes (append-only) and fast reads of the latest data

Choosing Storage

Finally, according to the above requirements for each storage, we chose the suitable storage. As one of the criteria to configure each storage properties and determine which storage is most suitable for LINE app workloads, we benchmarked several candidates using tools such as YCSB (Yahoo! Cloud Serving Benchmark) and our own original benchmark to simulate their workloads. As a result, we decided to use HBase as the primary storage method for storing data with the exponential growth patterns such as message timeline. The characteristics of HBase are suitable for message timeline, whose workload is the latest workload, where the most recently inserted records are in the head of the distribution.

  • O(1)
    • Redis is the best choice.
  • O(n), O(n*t)
    • There are several candidates.
    • HBase
      • pros:
        • Best matches our requirements
        • Easy to operate (Storage system built on DFS, multiple ad hoc partitions per server)
      • cons:
        • Random read and deletion are somewhat slow.
        • Slightly lower avaiability (there’re some SPOF)
    • Cassandra (My favorite NoSQL)
      • pros:
        • Also suitable for dealing with the latest workload
        • High Availability (decentralized architecture, rack/DC-aware replication)
      • cons:
        • High operation costs due to weak consistency
        • Counter increments are expected to be slightly slower.
    • MongoDB
      • pros:
        • Auto sharding, auto failover
        • A rich range of operations (but LINE storage doesn’t require most of them.)
      • cons:
        • NOT suitable for the timeline workload (B-tree indexing)
        • Ineffective disk and network utilization

In summary, LINE storage layer is currently constructed as the follows:

  1. Standalone Redis: asynchronous job and message queuing
    • Redis queue and queue dispatcher are running together on each application server.
  2. Sharded Redis: front-end cache for data with O(n*t) and primary storage with O(n)
  3. Backup MySQL: secondary storage (for backup, statistics)
  4. HBase: primary storage for data with O(n*t)
    • We assume to operate hundreds of terabytes of data on each cluster with 100s to 1000 servers.

LINE main storage is constructed from about 600 nodes and continues to increase month after month.

LINE Storage Stack

LINE Storage Stack

Data Migration from Redis to HBase

We gradually migrated tens of terabytes worth of data sets from Redis cluster to HBase cluster. Specifically, we migrated in three phases:

  1. Bi-directional write to Redis and HBase and read only from Redis
  2. Run migrating script on backend (Sequentially retrieve data from Redis and write to HBase)
  3. Write to both Redis (w/ TTL) and HBase (w/o TTL) and bi-directional read from both (Redis alternatives to a cache server.)

Something to make note of is that one should avoid overwriting recent data with the older data; the migrated data are most append-only and the consistency of the other data are kept using timestamp of HBase column.

HBase and HDFS

A number of HBase clusters have been running stably for the most part on HDFS. We constructed a HBase cluster for each database (e.g., messages, contacts) and each cluster is tuned according to the workload of each database. They share a single HDFS cluster consisting of 100 servers, where each server has 32GB of memory and 1.5TB of hard disk space. Each RegionServer has 50 small regions less than a single 10GB one. Read performance for Bigtable-like architecture is impacted by (major) compaction, so each region’s size should be kept not too large size to prevent continuous major compaction, especially during peak hours. During off-peak hours, large regions are automatically split into smaller regions by a periodic cron job, while operators manually perform load balancing. Of course, HBase has auto splitting and load balancing functionalities, but we consider it best to set up manually in view of service requirements.

Thus the growth of the service, scalability is one of the important issues. We plan to place at most hundreds of servers per cluster. Each message has TTL and it is partitioned to multi-cluster in units of TTL. By doing so, the old cluster, where all of messages have expired, is full-truncated and enables to be reused as a new cluster.

Current and future challenges [2012]

Since migrating to HBase, LINE storage has been operating more stably. Each HBase cluster is current processing several times as requests as during New Year peak time. Even still, there are sometimes failures due to storage. We are left with the following availability issues for HBase and Redis cluster.

  • A redundant configuration and failover feature that does not include a single point of failure for each component including rack/DC-awareness
    • We examine replication in various layers such as full replication and SSTable or block level replication between HDFS clusters.
  • Compensation for the failures between clusters (Redis cluster, HBase, and multi-HBase cluster)

HA-NameNode

As you may already know, the NameNode is a single point of failure for HDFS. Though the NameNode process itself rarely fails (Notes: Experience at Yahoo!), other software failures or hardware failures such as disk and network failures are bound to occur. A NameNode failover procedure is thus required in order to achieve high availability.

There are the several HA-NameNode configurations:

  • High Availability Framework for HDFS NameNode (HDFS-1623)
  • Backup NameNode (0.21)
  • Avatar NameNode (Facebook)
  • HA NameNode using Linux HA
  • Active/passive configuration deploying two NameNode (cloudera)

We configure HA-NameNode using Linux HA. Each component of Linux-HA has a role similar to the following:

  • DRBD: Disk mirroring
  • Heartbeat / (Corosync): Network fail-detector
  • Pacemaker: Failover definition
  • service: NameNode, Secondary NameNode, VIP
HA NameNode using Linux HA

HA NameNode using Linux HA

DRBD (Distributed Replicated Block Device) provides block level replication; essentially it’s network-enabled RAID driver. Heartbeat monitors the status of the network between the other server. If Heartbeat detects hardware or service outages, it switches primary/secondary in DRBD and kicks each service’s daemon based on logic defined by pacemaker.

Conclusion

Thus far, we’ve faced various challenges for scalability and availability with the growth of LINE. However, LINE storage and strategies will be much more immature, given extreme scaling and the various failure cases. We would like to grow ourselves with the future growth of LINE.

Appendix: How to setup HA-NameNode using Linux-HA

In the rest of this entry, I will introduce how to build HA-NameNode using two CentOS 5.4 servers and Linux-HA. These servers are to assume the following environment.

  • Hosts:
    1. NAMENODE01: 192.168.32.1 (bonding)
    2. NAMENODE02: 192.168.32.2 (bonding)
  • OS: CentOS 5.4
  • DRBD (v8.0.16):
    • conf file: ${DEPLOY_HOME}/ha/drbd.conf
    • resource name: drbd01
    • mount disk: /dev/sda3
    • mount device: /dev/drbd0
    • mount directory: /data/namenode
  • Heartbeat (v3.0.3):
    • conf file: ${DEPLOY_HOME}/ha/haresources, authkeys
  • Pacemaker (v1.0.12)
  • service daemons
    • VIP: 192.168.32.3
    • Hadoop NameNode, SecondaryNameNode (v1.0.2, the latest edition now)

Configuration

Configure drbd and heartbeat settings in your deploy home direcoty, ${DEPLOY_HOME}.

  • drbd.conf
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
global { usage-count no; }
 
resource drbd01 {
  protocol  C;
  syncer { rate 100M; }
  startup { wfc-timeout 0; degr-wfc-timeout 120; }
 
  on NAMENODE01 {
    device /dev/drbd0;
    disk    /dev/sda3;
    address 192.168.32.1:7791;
    meta-disk   internal;
  }
  on NAMENODE02 {
    device /dev/drbd0;
    disk    /dev/sda3;
    address 192.168.32.2:7791;
    meta-disk   internal;
  }
}
  • ha.conf
1
2
3
4
5
6
7
8
9
10
debugfile ${HOME}/logs/ha/ha-debug
logfile ${HOME}/logs/ha/ha-log
logfacility local0
pacemaker on
keepalive 1
deadtime 5
initdead 60
udpport 694
auto_failback off
node    NAMENODE01 NAMENODE02
  • haresources (Can skip this step when using pacemaker)
1
2
3
# <primary hostname> <vip> <drbd> <local fs path> <running daemon name>
NAMENODE01 IPaddr::192.168.32.3 drbddisk::drbd0 Filesystem::/dev/drbd0::/data/namenode::ext3::defaults hadoop-1.0.2-namenode
{code}
  • authkeys
1
2
auth 1
1 sha1 hadoop-namenode-cluster

Installation of Linux-HA

Pacemaker and Heartbeat3.0 packages are not included in the default base and updates repositories in CetOS5. Before installation, you first need to add the Cluster Labs repo:

1
wget -O /etc/yum.repos.d/clusterlabs.repo http://clusterlabs.org/rpm/epel-5/clusterlabs.repo

Then run the following script:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
yum install -y drbd kmod-drbd heartbeat pacemaker
 
# logs
mkdir -p ${HOME}/logs/ha
mkdir -p ${HOME}/data/pids/hadoop
 
# drbd
cd ${DRBD_HOME}
ln -sf ${DEPLOY_HOME}/drbd/drbd.conf drbd.conf
echo "/dev/drbd0 /data/namenode ext3 defaults,noauto 0 0" >> /etc/fstab
yes | drbdadm create-md drbd01
 
# heartbeat
cd ${HA_HOME}
ln -sf ${DEPLOY_HOME}/ha/ha.cf ha.cf
ln -sf ${DEPLOY_HOME}/ha/haresources haresources
cp ${DEPLOY_HOME}/ha/authkeys authkeys
chmod 600 authkeys
 
chown -R www.www ${HOME}/logs
chown -R www.www ${HOME}/data
chown -R www.www /data/namenode
 
chkconfig -add heartbeat
chkconfig hearbeat on

DRBD Initialization and Running heartbeat

  1. Run drbd service @ primary and secondary
  2. 1
    # service drbd start
  3. Initialize drbd and format NameNode@primary
  4. 1
    2
    3
    4
    5
    6
    # drbdadm -- --overwrite-data-of-peer primary drbd01
    # mkfs.ext3 /dev/drbd0
    # mount /dev/drbd0
    $ hadoop namenode -format
    # umount /dev/drbd0
    # service drbd stop
  5. Run heartbeat @ primary and secondary
  6. 1
    # service heartbeat start

Daemonize hadoop processes (Apache Hadoop)

When using Apache Hadoop, you need to daemonize each node such as NameNode, SecondaryNameNode in order for heartbeat process to kick them. The follow script, “hadoop-1.0.2-namenode” is an example for NameNode daemon.

  • /usr/lib/ocf/resource.d/nhnjp/hadoop-1.0.2-namenode
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
#!/bin/sh
 
BASENAME=$(basename $0)
HADOOP_RELEASE=$(echo $BASENAME | awk '{n = split($0, a, "-"); s=a[1]; s = a[1]; for(i = 2; i < n; ++i) s = s "-" a[i]; print s}')
SVNAME=$(echo $BASENAME | awk '{n = split($0, a, "-"); print a[n]}')
 
DAEMON_CMD=/usr/local/${HADOOP_RELEASE}/bin/hadoop-daemon.sh
[ -f $DAEMON_CMD ] || exit -1
 
RETVAL=0
case "$1" in
    start)
        start
        ;;
 
    stop)
        stop
        ;;
 
    restart)
        stop
        sleep 2
        start
        ;;
 
    *)
        echo "Usage: ${HADOOP_RELEASE}-${SVNAME} {start|stop|restart}"
        exit 1
    ;;
esac
exit $RETVAL

Second, place a script for pacemaker to kick this daemon services. There are pacemaker scripts under /usr/lib/ocf/resource.d/ .

  • /usr/lib/ocf/resource.d/nhnjp/Hadoop
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
#!/bin/bash
#
# Resource script for Hadoop service
#
# Description:  Manages Hadoop service as an OCF resource in
#               an High Availability setup.
#
#
#   usage: $0 {start|stop|status|monitor|validate-all|meta-data}
#
#   The "start" arg starts Hadoop service.
#
#   The "stop" arg stops it.
#
# OCF parameters:
# OCF_RESKEY_hadoopversion
# OCF_RESKEY_hadoopsvname
#
# Note:This RA uses 'jps' command to identify Hadoop process
##########################################################################
# Initialization:
 
: ${OCF_FUNCTIONS_DIR=${OCF_ROOT}/lib/heartbeat}
. ${OCF_FUNCTIONS_DIR}/ocf-shellfuncs
 
USAGE="Usage: $0 {start|stop|status|monitor|validate-all|meta-data}";
 
##########################################################################
 
usage()
{
    echo $USAGE >&2
}
 
meta_data()
{
cat <<END
<?xml version="1.0"?>
<!DOCTYPE resource-agent SYSTEM "ra-api-1.dtd">
<resource-agent name="Hadoop">
<version>1.0</version>
<longdesc lang="en">
This script manages Hadoop service.
</longdesc>
<shortdesc lang="en">Manages an Hadoop service.</shortdesc>
 
<parameters>
 
<parameter name="hadoopversion">
<longdesc lang="en">
Hadoop version identifier: hadoop-[version]
For example, "1.0.2" or "0.20.2-cdh3u3"
</longdesc>
<shortdesc lang="en">hadoop version string</shortdesc>
<content type="string" default="1.0.2"/>
</parameter>
 
<parameter name="hadoopsvname">
<longdesc lang="en">
Hadoop service name.
One of namenode|secondarynamenode|datanode|jobtracker|tasktracker
</longdesc>
<shortdesc lang="en">hadoop service name</shortdesc>
<content type="string" default="none"/>
</parameter>
 
</parameters>
 
<actions>
<action name="start" timeout="20s"/>
<action name="stop" timeout="20s"/>
<action name="monitor" depth="0" timeout="10s" interval="5s" />
<action name="validate-all" timeout="5s"/>
<action name="meta-data"  timeout="5s"/>
</actions>
</resource-agent>
END
exit $OCF_SUCCESS
}
 
HADOOP_VERSION="hadoop-${OCF_RESKEY_hadoopversion}"
HADOOP_HOME="/usr/local/${HADOOP_VERSION}"
[ -f "${HADOOP_HOME}/conf/hadoop-env.sh" ] && . "${HADOOP_HOME}/conf/hadoop-env.sh"
 
HADOOP_SERVICE_NAME="${OCF_RESKEY_hadoopsvname}"
HADOOP_PID_FILE="${HADOOP_PID_DIR}/hadoop-www-${HADOOP_SERVICE_NAME}.pid"
 
trace()
{
    ocf_log $@
    timestamp=$(date "+%Y-%m-%d %H:%M:%S")
    echo "$timestamp ${HADOOP_VERSION}-${HADOOP_SERVICE_NAME} $@" >> /dev/null
}
 
Hadoop_status()
{
    trace "Hadoop_status()"
    if [ -n "${HADOOP_PID_FILE}" -a -f "${HADOOP_PID_FILE}" ]; then
        # Hadoop is probably running
        HADOOP_PID=`cat "${HADOOP_PID_FILE}"`
        if [ -n "$HADOOP_PID" ]; then
            if ps f -p $HADOOP_PID | grep -qwi "${HADOOP_SERVICE_NAME}" ; then
                trace info "Hadoop ${HADOOP_SERVICE_NAME} running"
                return $OCF_SUCCESS
            else
                trace info "Hadoop ${HADOOP_SERVICE_NAME} is not running but pid file exists"
                return $OCF_NOT_RUNNING
            fi
        else
            trace err "PID file empty!"
            return $OCF_ERR_GENERIC
        fi
    fi
 
    # Hadoop is not running
    trace info "Hadoop ${HADOOP_SERVICE_NAME} is not running"
    return $OCF_NOT_RUNNING
}
 
Hadoop_start()
{
    trace "Hadoop_start()"
    # if Hadoop is running return success
    Hadoop_status
    retVal=$?
    if [ $retVal -eq $OCF_SUCCESS ]; then
        exit $OCF_SUCCESS
    elif [ $retVal -ne $OCF_NOT_RUNNING ]; then
        trace err "Error. Unknown status."
        exit $OCF_ERR_GENERIC
    fi
 
    service ${HADOOP_VERSION}-${HADOOP_SERVICE_NAME} start
    if [ $? -ne 0 ]; then
        trace err "Error. Hadoop ${HADOOP_SERVICE_NAME} returned error $?."
        exit $OCF_ERR_GENERIC
    fi
 
    trace info "Started Hadoop ${HADOOP_SERVICE_NAME}."
    exit $OCF_SUCCESS
}
 
Hadoop_stop()
{
    trace "Hadoop_stop()"
    if Hadoop_status ; then
        HADOOP_PID=`cat "${HADOOP_PID_FILE}"`
        if [ -n "$HADOOP_PID" ] ; then
            kill $HADOOP_PID
            if [ $? -ne 0 ]; then
                kill -s KILL $HADOOP_PID
                if [ $? -ne 0 ]; then
                    trace err "Error. Could not stop Hadoop ${HADOOP_SERVICE_NAME}."
                    return $OCF_ERR_GENERIC
                fi
            fi
            rm -f "${HADOOP_PID_FILE}" 2>/dev/null
        fi
    fi
    trace info "Stopped Hadoop ${HADOOP_SERVICE_NAME}."
    exit $OCF_SUCCESS
}
 
Hadoop_monitor()
{
    trace "Hadoop_monitor()"
    Hadoop_status
}
 
Hadoop_validate_all()
{
    trace "Hadoop_validate_all()"
    if [ ! -n ${OCF_RESKEY_hadoopversion} ] || [ "${OCF_RESKEY_hadoopversion}" == "none" ]; then
        trace err "Invalid hadoop version: ${OCF_RESKEY_hadoopversion}"
        exit $OCF_ERR_ARGS
    fi
 
    if [ ! -n ${OCF_RESKEY_hadoopsvname} ] || [ "${OCF_RESKEY_hadoopsvname}" == "none" ]; then
        trace err "Invalid hadoop service name: ${OCF_RESKEY_hadoopsvname}"
        exit $OCF_ERR_ARGS
    fi
 
    HADOOP_INIT_SCRIPT=/etc/init.d/${HADOOP_VERSION}-${HADOOP_SERVICE_NAME}
    if [ ! -d "${HADOOP_HOME}" ] || [ ! -x ${HADOOP_INIT_SCRIPT} ]; then
        trace err "Cannot find ${HADOOP_VERSION}-${HADOOP_SERVICE_NAME}"
        exit $OCF_ERR_ARGS
    fi
 
    if [ ! -L ${HADOOP_HOME}/conf ] || [ ! -f "$(readlink ${HADOOP_HOME}/conf)/hadoop-env.sh" ]; then
        trace err "${HADOOP_VERSION} isn't configured yet"
        exit $OCF_ERR_ARGS
    fi
 
    # TODO: do more strict checking
 
    return $OCF_SUCCESS
}
 
if [ $# -ne 1 ]; then
    usage
    exit $OCF_ERR_ARGS
fi
 
case $1 in
    start)
        Hadoop_start
        ;;
 
    stop)
        Hadoop_stop
        ;;
 
    status)
        Hadoop_status
        ;;
 
    monitor)
        Hadoop_monitor
        ;;
 
    validate-all)
        Hadoop_validate_all
        ;;
 
    meta-data)
        meta_data
        ;;
 
    usage)
        usage
        exit $OCF_SUCCESS
        ;;
 
    *)
        usage
        exit $OCF_ERR_UNIMPLEMENTED
        ;;
esac

Pacemaker settings

First, using the crm_mon command, verify whether the heartbeat process is running.

1
2
3
4
5
6
7
8
9
10
# crm_mon
Last updated: Thu Mar 29 17:32:36 2012
Stack: Heartbeat
Current DC: NAMENODE01 (bc16bea6-bed0-4b22-be37-d1d9d4c4c213)-partition with quorum
Version: 1.0.12
2 Nodes configured, unknown expected votes
0 Resources configured.
============
 
Online: [ NAMENODE01 NAMENODE02 ]

After verifying the process is running, connect to pacemaker using the crm command and configure its resource settings. (This step is needed instead of haresource setting)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
crm(live)# configure
INFO: building help index
crm(live)configure# show
node $id="bc16bea6-bed0-4b22-be37-d1d9d4c4c213" NAMENODE01
node $id="25884ee1-3ce4-40c1-bdc9-c2ddc9185771" NAMENODE02
property $id="cib-bootstrap-options" \
        dc-version="1.0.12" \
        cluster-infrastructure="Heartbeat"
 
# if this cluster is composed of two NameNode, the following setting is need.
crm(live)configure# property $id="cib-bootstrap-options" no-quorum-policy="ignore"
 
# vip setting
crm(live)configure# primitive ip_namenode ocf:heartbeat:IPaddr \
params ip="192.168.32.3"
 
# drbd setting
crm(live)configure# primitive drbd_namenode ocf:heartbeat:drbd \
        params drbd_resource="drbd01" \
        op start interval="0s" timeout="10s" on-fail="restart" \
        op stop interval="0s" timeout="60s" on-fail="block"
# drbd master/slave setting
crm(live)configure# ms ms_drbd_namenode drbd_namenode meta master-max="1" \
master-node-max="1" clone-max="2" clone-node-max="1" notify="true"
 
# fs mount setting
crm(live)configure# primitive fs_namenode ocf:heartbeat:Filesystem \
params device="/dev/drbd0" directory="/data/namenode" fstype="ext3"
 
# service daemon setting
primitive namenode ocf:nhnjp:Hadoop \
        params hadoopversion="1.0.2" hadoopsvname="namenode" \
        op monitor interval="5s" timeout="60s" on-fail="standby"
primitive secondarynamenode ocf:nhnjp:Hadoop \
        params hadoopversion="1.0.2" hadoopsvname="secondarynamenode" \
        op monitor interval="30s" timeout="60s" on-fail="restart"

Here, ocf:${GROUP}/${SERVICE} path corresponds with /usr/lib/ocf/resource.d/${GROUP}/${SERVICE}. So you should place your original service script there. Also lsb:${SERVICE} path corresponds with /etc/init.d/${SERVICE}.

Finnaly, you can confirm pacemaker’s settings using the show command.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
crm(live)configure# show
node $id="bc16bea6-bed0-4b22-be37-d1d9d4c4c213" NAMENODE01
node $id="25884ee1-3ce4-40c1-bdc9-c2ddc9185771" NAMENODE02
primitive drbd_namenode ocf:heartbeat:drbd \
        params drbd_resource="drbd01" \
        op start interval="0s" timeout="10s" on-fail="restart" \
        op stop interval="0s" timeout="60s" on-fail="block"
primitive fs_namenode ocf:heartbeat:Filesystem \
        params device="/dev/drbd0" directory="/data/namenode" fstype="ext3"
primitive ip_namenode ocf:heartbeat:IPaddr \
        params ip="192.168.32.3"
primitive namenode ocf:nhnjp:Hadoop \
        params hadoopversion="1.0.2" hadoopsvname="namenode" \
        meta target-role="Started" \
        op monitor interval="5s" timeout="60s" on-fail="standby"
primitive secondarynamenode ocf:nhnjp:Hadoop \
        params hadoopversion="1.0.2" hadoopsvname="secondarynamenode" \
        meta target-role="Started" \
        op monitor interval="30s" timeout="60s" on-fail="restart"
group namenode-group fs_namenode ip_namenode namenode secondarynamenode
ms ms_drbd_namenode drbd_namenode \
        meta master-max="1" master-node-max="1" clone-max="2" \
        clone-node-max="1" notify="true" globally-unique="false"
colocation namenode-group_on_drbd inf: namenode-group ms_drbd_namenode:Master
order namenode_after_drbd inf: ms_drbd_namenode:promote namenode-group:start
property $id="cib-bootstrap-options" \
        dc-version="1.0.12" \
        cluster-infrastructure="Heartbeat" \
        no-quorum-policy="ignore" \
        stonith-enabled="false"

Once you’ve confirmed the configuration is correct, commit it using the commit command.

1
crm(live)configure# commit

Once you’ve run the commit command, heartbeat kicks each service following pacemaker’s rules.
You can monitor dead or alive using the crm_mon command.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
$crm_mon -A
 
============
Last updated: Tue Apr 10 12:40:11 2012
Stack: Heartbeat
Current DC: NAMENODE01 (bc16bea6-bed0-4b22-be37-d1d9d4c4c213)-partition with quorum
Version: 1.0.12
2 Nodes configured, unknown expected votes
2 Resources configured.
============
 
Online: [ NAMENODE01 NAMENODE02 ]
 
 Master/Slave Set: ms_drbd_namenode
     Masters: [ NAMENODE01 ]
     Slaves: [ NAMENODE02 ]
 Resource Group: namenode-group
     fs_namenode        (ocf::heartbeat:Filesystem):    Started NAMENODE01
     ip_namenode        (ocf::heartbeat:IPaddr):        Started NAMENODE01
     namenode   (ocf::nhnjp:Hadoop):    Started NAMENODE01
     secondarynamenode  (ocf::nhnjp:Hadoop):    Started NAMENODE01
 
Node Attributes:
* Node NAMENODE01:
    + master-drbd_namenode:0            : 75
* Node NAMENODE02:
    + master-drbd_namenode:1            : 75

Finally, you should test the various failover tests. For example, kill each service daemon and cause pseudo-network failures using iptables.

Reference documents


출처 - http://tech.naver.jp/blog/?p=1420



Posted by linuxism
,



HTTPS, TCP를 통한 SSL 및 SOAP 보안의 인증서 유효성 검사 차이점

.NET Framework 4.5
이 항목은 아직 평가되지 않았습니다.이 항목 평가

HTTPS(HTTP를 통한 TLS(전송 계층 보안)) 또는 TCP를 통한 TLS와 함께 메시지 계층(SOAP) 보안이 있는 WCF(Windows Communication Foundation)의 인증서를 사용할 수 있습니다. 이 항목에서는 이러한 인증서의 유효성을 검사하는 방법의 차이점에 대해 설명합니다.

HTTPS를 사용하여 클라이언트와 서비스 간에 통신하는 경우 클라이언트에서 서비스를 인증하는 데 사용하는 인증서는 신뢰 체인을 지원해야 합니다. 즉, 신뢰할 수 있는 루트 인증 기관에 연결되어야 합니다. 연결되지 않은 경우 HTTP 계층에서 "원격 서버에서 (403) 사용할 수 없음 오류를 반환했습니다."라는 메시지와 함께 WebException이 발생합니다. WCF는 이 예외를 MessageSecurityException으로 표시합니다.

HTTPS를 사용하여 클라이언트와 서비스 간에 통신하는 경우 서버에서 인증하는 데 사용하는 인증서는 신뢰 체인을 기본적으로 지원해야 합니다. 즉, 신뢰할 수 있는 루트 인증 기관에 연결되어야 합니다. 인증서가 해지되었는지 여부를 확인하기 위해 온라인 확인을 수행하지는 않습니다. 다음 코드에서처럼 RemoteCertificateValidationCallback 콜백을 등록하여 이 동작을 재정의할 수 있습니다.

ServicePointManager.ServerCertificateValidationCallback +=
    new RemoteCertificateValidationCallback(ValidateServerCertificate);


여기서 ValidateServerCertificate의 서명은 다음과 같습니다.

public static bool ValidateServerCertificate(
  object sender,
  X509Certificate certificate,
  X509Chain chain,
  SslPolicyErrors sslPolicyErrors)


ValidateServerCertificate를 구현하면 클라이언트 응용 프로그램 개발자가 서비스 인증서의 유효성을 검사하는 데 필요하다고 판단하는 확인 사항을 수행할 수 있습니다.

TCP를 통한 SSL(Secure Sockets Layer) 또는 SOAP 메시지 보안을 사용하는 경우 X509ClientCertificateAuthentication 클래스의 CertificateValidationMode 속성 값에 따라 클라이언트 인증서의 유효성을 검사합니다. 속성은 X509CertificateValidationMode 값 중 하나로 설정됩니다. X509ClientCertificateAuthentication 클래스의 RevocationMode 속성 값에 따라 해지 확인을 수행합니다. 속성은 X509RevocationMode 값 중 하나로 설정됩니다.

myServiceHost.Credentials.ClientCertificate.Authentication.
    CertificateValidationMode=
    X509CertificateValidationMode.PeerOrChainTrust;       

myServiceHost.Credentials.ClientCertificate.Authentication.
    RevocationMode=X509RevocationMode.Offline; 


TCP를 통한 SSL 또는 SOAP 메시지 보안을 사용하는 경우 X509ServiceCertificateAuthentication 클래스의 CertificateValidationMode 속성 값에 따라 서비스 인증서의 유효성을 검사합니다. 속성은 X509CertificateValidationMode 값 중 하나로 설정됩니다.

X509ServiceCertificateAuthentication 클래스의 RevocationMode 속성 값에 따라 해지 확인을 수행합니다. 속성은 X509RevocationMode 값 중 하나로 설정됩니다.

myClient.ClientCredentials.ServiceCertificate.
    Authentication.CertificateValidationMode=
    X509CertificateValidationMode.PeerOrChainTrust;
myClient.ClientCredentials.ServiceCertificate.Authentication.
    RevocationMode = X509RevocationMode.Offline;


출처 - http://msdn.microsoft.com/ko-kr/library/aa702579.aspx




'Web > Common' 카테고리의 다른 글

네이버지도에서 ‘내 위치’  (0) 2013.04.29
web - content negotiation  (0) 2013.01.10
screen(display) resolution  (0) 2012.11.20
Bootstrap 소개  (0) 2012.11.04
http - X-Forwarded-For 헤더 필드  (0) 2012.08.21
Posted by linuxism
,


Fedora 16부터는 서비스(Service) 관리를 systemctl이라는 명령어로 합니다. 이전 명령어들도 사용할 수 있지만 systemctl 사용을 권장하고 있습니다. systemctl 명령어의 주요 사용법을 정리합니다.

서비스 상태 확인 하기

서비스 상태 확인은 다음과 같이 합니다.

systemctl status service_name.service

예를 들어 httpd의 상태를 확인하려면 다음과 같이 명령합니다.

systemctl status httpd.service

서비스 시작, 중지, 재시작 하기

서비스를 시작하는 기본 명령 방식은 다음과 같습니다.

systemctl start service_name.service

중지할 때에는 start 대신 stop, 재시작할 때에는 restart를 쓰면 됩니다. 예를 들어 httpd를 시작하려면 다음과 같이 명령한다.

systemctl start httpd.service

중지와 재시작은 다음과 같이 명령하면 된다.

systemctl stop httpd.service
systemctl restart httpd.service

부팅 시 자동 실행 하기

시스템 부팅 시 서비스가 자동 실행되게 하는 명령은 다음과 같습니다.

systemctl enable service_name.service

예를 들어 httpd를 부팅 시 자동 실행되게 하려면 다음과 같이 명령합니다.

systemctl enable httpd.service

자동 실행 되지 않게 하려면 enable 대신 disable을 쓰면 됩니다.

실행 중인 서비스 보기

실행 중인 모든 서비스 목록을 보려면 다음과 같이 명령합니다.

systemctl list-units --type=service


출처 - http://www.cmsfactory.net/node/62






systemd 시작하기

systemd 는 기존 전통적인 init 데몬을 대체하는 차세대 리눅스 시스템 데몬이다.
redhat ES 6는 아직 init 데몬 기반이지만 ES 7 에서는 systemd 가 도입될 것이다. 그리고 다른 배포판에서는 도입이 되었거나 도입 예정이다.
systemd 에 대해서 간략하게 알아보자!
주: 이제 더이상 "/etc/inittab" 파일 따위는 사용하지 않는다!!

1. systemd 저널 로그보기
# journalctl
-> 더이상 syslog 데몬을 띄울 필요는 없게 되었다. syslog 의 기능을 이용하지 않는다면 그냥 저널로그를 이용하자. 먼저 # mkdir /var/log/journal/ 명령으로 디렉토리를 만들어줘야 한다.

2. 서비스 활성화/비활성화 하기
# systemctl enable <service_name>.service
# systemctl disable <service_name>.service

3. 서비스 제어하기
systemctl start <service_name>.service
systemctl stop <service_name>.service
systemctl restart <service_name>.service
systemctl reload <service_name>.service
# systemctl status <service_name>.service


4. 서비스 제거/등록 하기
systemctl mask <service_name>.service
systemctl unmask <service_name>.service
systemctl daemon-reload


5. 런레벨 바꾸기
systemctl isolate runlevel<NUM>.target


6. 서비스 리스트 보기
systemctl list-units --no-pager --type=service



출처 - http://joonlinux.blogspot.kr/2012/07/systemd.html





'System > Linux' 카테고리의 다른 글

fedora - vi 및 gedit 한글 깨짐 해결  (0) 2013.03.07
리눅스 CPU당 코어(core) 수 확인  (0) 2013.02.03
SELinux - AVC Denials  (0) 2013.01.08
linux - audit2allow  (0) 2013.01.08
고급 Bash 스크립팅 가이드  (0) 2013.01.02
Posted by linuxism
,