To share Code style schemes in IntelliJ, do as follows:
File -> Export Settings... -> Select None -> Code style schemes -> OK
Reference:
https://www.jetbrains.com/help/idea/2016.2/exporting-and-importing-settings.html
Friday, July 29, 2016
Install spaCy
To install spaCy, do as follows:
Johnnyui-MacBook-Pro:~ izeye$ python -m pip install -U pip virtualenv
...
Johnnyui-MacBook-Pro:~ izeye$ virtualenv .env -p python2
Running virtualenv with interpreter /Library/Frameworks/Python.framework/Versions/2.7/bin/python2
New python executable in /Users/izeye/.env/bin/python
Installing setuptools, pip, wheel...done.
Johnnyui-MacBook-Pro:~ izeye$ source .env/bin/activate
(.env) Johnnyui-MacBook-Pro:~ izeye$ pip install spacy
(.env) Johnnyui-MacBook-Pro:~ izeye$ python
Python 2.7.11 (v2.7.11:6d1b6a68f775, Dec 5 2015, 12:54:16)
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import spacy
>>>
(.env) Johnnyui-MacBook-Pro:~ izeye$ python -m spacy.en.download
Downloading...
Downloaded 532.28MB 100.00% 0.24MB/s eta 0s
archive.gz checksum/md5 OK
Model successfully installed.
(.env) Johnnyui-MacBook-Pro:~ izeye$ python -c "import spacy; spacy.load('en'); print('OK')"
OK
Reference:
https://spacy.io/docs#getting-started
Johnnyui-MacBook-Pro:~ izeye$ python -m pip install -U pip virtualenv
...
Johnnyui-MacBook-Pro:~ izeye$ virtualenv .env -p python2
Running virtualenv with interpreter /Library/Frameworks/Python.framework/Versions/2.7/bin/python2
New python executable in /Users/izeye/.env/bin/python
Installing setuptools, pip, wheel...done.
Johnnyui-MacBook-Pro:~ izeye$ source .env/bin/activate
(.env) Johnnyui-MacBook-Pro:~ izeye$ pip install spacy
(.env) Johnnyui-MacBook-Pro:~ izeye$ python
Python 2.7.11 (v2.7.11:6d1b6a68f775, Dec 5 2015, 12:54:16)
[GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import spacy
>>>
(.env) Johnnyui-MacBook-Pro:~ izeye$ python -m spacy.en.download
Downloading...
Downloaded 532.28MB 100.00% 0.24MB/s eta 0s
archive.gz checksum/md5 OK
Model successfully installed.
(.env) Johnnyui-MacBook-Pro:~ izeye$ python -c "import spacy; spacy.load('en'); print('OK')"
OK
(.env) Johnnyui-MacBook-Pro:~ izeye$ python -c "import os; import spacy; print(os.path.dirname(spacy.__file__))"
/Users/izeye/.env/lib/python2.7/site-packages/spacy
(.env) Johnnyui-MacBook-Pro:~ izeye$ python -m pip install -U pytest
...
...
(.env) Johnnyui-MacBook-Pro:~ izeye$ python -m pytest /Users/izeye/.env/lib/python2.7/site-packages/spacy --vectors --model --slow
...
(.env) Johnnyui-MacBook-Pro:~ izeye$
https://spacy.io/docs#getting-started
Checkstyle RightCurly alone with IntelliJ
To make Checkstyle RightCurly alone happy with IntelliJ, do as follows:
File -> Settings... -> Code Style -> Java -> Wrapping and Braces
* 'if()' statement
'else' on new line -> true
* 'try' statement
'catch' on new line -> true
'finally' on new line -> true
Reformat Code...
File -> Settings... -> Code Style -> Java -> Wrapping and Braces
* 'if()' statement
'else' on new line -> true
* 'try' statement
'catch' on new line -> true
'finally' on new line -> true
Reformat Code...
ERROR: virtualenv is not compatible with this system or executable
I got the following errors:
$ virtualenv .env
Using base prefix '/Users/izeye/anaconda'
New python executable in /Users/izeye/.env/bin/python
ERROR: The executable /Users/izeye/.env/bin/python is not functioning
ERROR: It thinks sys.prefix is '/Users/izeye' (should be '/Users/izeye/.env')
ERROR: virtualenv is not compatible with this system or executable
$
I just gave up to use Python 3 and worked around with Python 2 as follows:
$ virtualenv .env -p python2
Running virtualenv with interpreter /Library/Frameworks/Python.framework/Versions/2.7/bin/python2
New python executable in /Users/izeye/.env/bin/python
Installing setuptools, pip, wheel...done.
$
$ virtualenv .env
Using base prefix '/Users/izeye/anaconda'
New python executable in /Users/izeye/.env/bin/python
ERROR: The executable /Users/izeye/.env/bin/python is not functioning
ERROR: It thinks sys.prefix is '/Users/izeye' (should be '/Users/izeye/.env')
ERROR: virtualenv is not compatible with this system or executable
$
I just gave up to use Python 3 and worked around with Python 2 as follows:
$ virtualenv .env -p python2
Running virtualenv with interpreter /Library/Frameworks/Python.framework/Versions/2.7/bin/python2
New python executable in /Users/izeye/.env/bin/python
Installing setuptools, pip, wheel...done.
$
Add @author tags for Javadoc comments in IntelliJ
To add @author tags for Javadoc comments in IntelliJ, do as follows:
Preferences... -> File and Code Templates -> Includes -> File Header
/**
* Fill me.
*
* @author Johnny Lim
*/
Preferences... -> File and Code Templates -> Includes -> File Header
/**
* Fill me.
*
* @author Johnny Lim
*/
Wednesday, July 27, 2016
Use CheckStyle in IntelliJ
To use CheckStyle in IntelliJ, do as follows:
File -> Settings... -> CheckStyle
Add a CheckStyle configuration file and activate it.
Open `Checkstyle` window and click `Check Project`.
File -> Settings... -> CheckStyle
Add a CheckStyle configuration file and activate it.
Open `Checkstyle` window and click `Check Project`.
Apply a Copyright comment to all Java source files in IntelliJ
To apply a Copyright comment to all Java source files in IntelliJ, do as follows:
IntelliJ IDEA -> Preferences...
Copyright -> Copyright Profiles
Add a profile as follows:
```
Copyright 2016 the original author or authors.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
```
In `Copyright`, select one for `Default project copyright`.
Add a scope.
Finally, apply the Copyright as follows:
`src/main/java` -> Update Copyright...
`src/test/java` -> Update Copyright...
Reference:
https://www.jetbrains.com/help/idea/2016.1/generating-and-updating-copyright-notice.html
IntelliJ IDEA -> Preferences...
Copyright -> Copyright Profiles
Add a profile as follows:
```
Copyright 2016 the original author or authors.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
```
In `Copyright`, select one for `Default project copyright`.
Add a scope.
Finally, apply the Copyright as follows:
`src/main/java` -> Update Copyright...
`src/test/java` -> Update Copyright...
Reference:
https://www.jetbrains.com/help/idea/2016.1/generating-and-updating-copyright-notice.html
Monday, July 25, 2016
IllegalArgumentException[No custom metadata prototype registered for type [licenses], node like missing plugins]
If you encounter the following error:
[2016-07-25 16:23:24,384][INFO ][discovery.zen ] [Alex Wilder] failed to send join request to master [{Surtur}{8l2V-7MmSvKyC4oChA1gPA}{1.2.3.4}{1.2.3.4:9300}], reason [RemoteTransportException[[Surtur][1.2.3.4:9300][internal:discovery/zen/join]]; nested: IllegalStateException[failure when sending a validation request to node]; nested: RemoteTransportException[[Alex Wilder][1.2.3.5:9300][internal:discovery/zen/join/validate]]; nested: IllegalArgumentException[No custom metadata prototype registered for type [licenses], node like missing plugins]; ]
Install missing plugins as follows:
./bin/plugin install license
./bin/plugin install marvel-agent
[2016-07-25 16:23:24,384][INFO ][discovery.zen ] [Alex Wilder] failed to send join request to master [{Surtur}{8l2V-7MmSvKyC4oChA1gPA}{1.2.3.4}{1.2.3.4:9300}], reason [RemoteTransportException[[Surtur][1.2.3.4:9300][internal:discovery/zen/join]]; nested: IllegalStateException[failure when sending a validation request to node]; nested: RemoteTransportException[[Alex Wilder][1.2.3.5:9300][internal:discovery/zen/join/validate]]; nested: IllegalArgumentException[No custom metadata prototype registered for type [licenses], node like missing plugins]; ]
Install missing plugins as follows:
./bin/plugin install license
./bin/plugin install marvel-agent
Thursday, July 21, 2016
AWK fields and `if` sample
This is an AWK fields and `if` sample:
cat logs/user_agent/user_agent.log | awk 'BEGIN { FS = "\t" }; { if ($1 == "1234") print $2 }' > user_agent_pc.txt
References:
https://www.gnu.org/software/gawk/manual/html_node/Field-Separators.html
http://www.thegeekstuff.com/2010/02/awk-conditional-statements/?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+TheGeekStuff+(The+Geek+Stuff)
cat logs/user_agent/user_agent.log | awk 'BEGIN { FS = "\t" }; { if ($1 == "1234") print $2 }' > user_agent_pc.txt
References:
https://www.gnu.org/software/gawk/manual/html_node/Field-Separators.html
http://www.thegeekstuff.com/2010/02/awk-conditional-statements/?utm_source=feedburner&utm_medium=feed&utm_campaign=Feed%3A+TheGeekStuff+(The+Geek+Stuff)
Show histogram on live objects in Java heap
To show histogram on live objects in Java heap, do as follows:
jmap -histo:live 1234
jmap -histo:live 1234
Monday, July 18, 2016
Disable replicas of a new index in Elasticsearch
To disable replicas of a new index in Elasticsearch, do as follows:
curl -XPUT 'localhost:9200/_template/logstash_template' -d '
{
"template" : "logstash-*",
"settings" : {
"number_of_replicas" : 0
}
}'
Reference:
http://stackoverflow.com/questions/24553718/updating-the-default-index-number-of-replicas-setting-for-new-indices
curl -XPUT 'localhost:9200/_template/logstash_template' -d '
{
"template" : "logstash-*",
"settings" : {
"number_of_replicas" : 0
}
}'
Reference:
http://stackoverflow.com/questions/24553718/updating-the-default-index-number-of-replicas-setting-for-new-indices
Disable replicas of an existing index in Elasticsearch
To disable replicas of an existing index in Elasticsearch, do as follows:
curl -XPUT 'localhost:9200/logstash-2016.07.18/_settings' -d '
{
"index" : {
"number_of_replicas" : 0
}
}'
Reference:
https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-update-settings.html
curl -XPUT 'localhost:9200/logstash-2016.07.18/_settings' -d '
{
"index" : {
"number_of_replicas" : 0
}
}'
Reference:
https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-update-settings.html
Setup Elasticsearch cluster
Add the following configuration to `config/elasticsearch.yml` in each instance of Elasticsearch:
cluster:
name: some-log
network:
host:
- _eth1_
- _local_
discovery.zen.ping.unicast.hosts: ["1.2.3.4", "1.2.3.5", "1.2.3.6", "1.2.3.7", "1.2.3.8"]
discovery.zen.minimum_master_nodes: 1
Note the value of `discovery.zen.minimum_master_nodes` is used for simplicity. Based on the recommendation it will be 3:
# Prevent the "split brain" by configuring the majority of nodes (total number of nodes / 2 + 1):
#
# discovery.zen.minimum_master_nodes: 3
cluster:
name: some-log
network:
host:
- _eth1_
- _local_
discovery.zen.ping.unicast.hosts: ["1.2.3.4", "1.2.3.5", "1.2.3.6", "1.2.3.7", "1.2.3.8"]
discovery.zen.minimum_master_nodes: 1
Note the value of `discovery.zen.minimum_master_nodes` is used for simplicity. Based on the recommendation it will be 3:
# Prevent the "split brain" by configuring the majority of nodes (total number of nodes / 2 + 1):
#
# discovery.zen.minimum_master_nodes: 3
Thursday, July 14, 2016
Change Elasticsearch heap size
To change Elasticsearch heap size, use the `ES_HEAP_SIZE` environment variable as follows:
ES_HEAP_SIZE=8g ./bin/elasticsearch
Reference:
https://www.elastic.co/guide/en/elasticsearch/guide/current/heap-sizing.html
ES_HEAP_SIZE=8g ./bin/elasticsearch
Reference:
https://www.elastic.co/guide/en/elasticsearch/guide/current/heap-sizing.html
Wednesday, July 13, 2016
Install Marvel
Intall Marvel into Elasticsearch and Kibana as follows:
cd programs/elasticsearch-2.3.3
./bin/plugin install license
./bin/plugin install marvel-agent
cd ../kibana-4.5.1-linux-x64
./bin/kibana plugin --install elasticsearch/marvel/latest
Restart Elasticsearch and Kibana.
Check the following URL:
http://localhost:5601/app/marvel
Reference:
https://www.elastic.co/kr/downloads/marvel
cd programs/elasticsearch-2.3.3
./bin/plugin install license
./bin/plugin install marvel-agent
cd ../kibana-4.5.1-linux-x64
./bin/kibana plugin --install elasticsearch/marvel/latest
Restart Elasticsearch and Kibana.
Check the following URL:
http://localhost:5601/app/marvel
Reference:
https://www.elastic.co/kr/downloads/marvel
Show all document in an index in Elasticsearch
To show all document in an index in Elasticsearch, do as follows:
$ curl 'localhost:9200/logstash/_search?pretty=true&q=*:*'
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [ {
"_index" : "logstash",
"_type" : "logstash",
"_id" : "AVXjVaB4eRCf5XO_Qkwg",
"_score" : 1.0,
"_source" : {
"firstName" : "Johnny",
"lastName" : "Lim"
}
} ]
}
}
$
Reference:
http://stackoverflow.com/questions/8829468/elasticsearch-query-to-return-all-records
$ curl 'localhost:9200/logstash/_search?pretty=true&q=*:*'
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 5,
"successful" : 5,
"failed" : 0
},
"hits" : {
"total" : 1,
"max_score" : 1.0,
"hits" : [ {
"_index" : "logstash",
"_type" : "logstash",
"_id" : "AVXjVaB4eRCf5XO_Qkwg",
"_score" : 1.0,
"_source" : {
"firstName" : "Johnny",
"lastName" : "Lim"
}
} ]
}
}
$
Reference:
http://stackoverflow.com/questions/8829468/elasticsearch-query-to-return-all-records
Monday, July 11, 2016
ZooKeeper Hello, world!
Install ZooKeeper as follows:
tar zxvf zookeeper-3.4.8.tar.gz
Setup and run ZooKeeper as follows:
cd zookeeper-3.4.8
conf/zoo.cfg
tickTime=2000
dataDir=/Users/izeye/zookeeper-data
clientPort=2181
./bin/zkServer.sh start
Test ZooKeeper as follows:
./bin/zkCli.sh
[zk: localhost:2181(CONNECTED) 0] help
ZooKeeper -server host:port cmd args
stat path [watch]
set path data [version]
ls path [watch]
delquota [-n|-b] path
ls2 path [watch]
setAcl path acl
setquota -n|-b val path
history
redo cmdno
printwatches on|off
delete path [version]
sync path
listquota path
rmr path
get path [watch]
create [-s] [-e] path data acl
addauth scheme auth
quit
getAcl path
close
connect host:port
[zk: localhost:2181(CONNECTED) 1] ls /
[zookeeper]
[zk: localhost:2181(CONNECTED) 2] create /zk_test my_data
Created /zk_test
[zk: localhost:2181(CONNECTED) 3] ls /
[zookeeper, zk_test]
[zk: localhost:2181(CONNECTED) 4] get /zk_test
my_data
cZxid = 0x11
ctime = Mon Jul 11 21:03:22 KST 2016
mZxid = 0x11
mtime = Mon Jul 11 21:03:22 KST 2016
pZxid = 0x11
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 7
numChildren = 0
[zk: localhost:2181(CONNECTED) 5] set /zk_test junk
cZxid = 0x11
ctime = Mon Jul 11 21:03:22 KST 2016
mZxid = 0x12
mtime = Mon Jul 11 21:05:11 KST 2016
pZxid = 0x11
cversion = 0
dataVersion = 1
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 4
numChildren = 0
[zk: localhost:2181(CONNECTED) 6] get /zk_test
junk
cZxid = 0x11
ctime = Mon Jul 11 21:03:22 KST 2016
mZxid = 0x12
mtime = Mon Jul 11 21:05:11 KST 2016
pZxid = 0x11
cversion = 0
dataVersion = 1
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 4
numChildren = 0
[zk: localhost:2181(CONNECTED) 7] delete /zk_test
[zk: localhost:2181(CONNECTED) 8] ls /
[zookeeper]
[zk: localhost:2181(CONNECTED) 9]
Reference:
https://zookeeper.apache.org/doc/r3.4.8/zookeeperStarted.html
tar zxvf zookeeper-3.4.8.tar.gz
Setup and run ZooKeeper as follows:
cd zookeeper-3.4.8
conf/zoo.cfg
tickTime=2000
dataDir=/Users/izeye/zookeeper-data
clientPort=2181
./bin/zkServer.sh start
Test ZooKeeper as follows:
./bin/zkCli.sh
[zk: localhost:2181(CONNECTED) 0] help
ZooKeeper -server host:port cmd args
stat path [watch]
set path data [version]
ls path [watch]
delquota [-n|-b] path
ls2 path [watch]
setAcl path acl
setquota -n|-b val path
history
redo cmdno
printwatches on|off
delete path [version]
sync path
listquota path
rmr path
get path [watch]
create [-s] [-e] path data acl
addauth scheme auth
quit
getAcl path
close
connect host:port
[zk: localhost:2181(CONNECTED) 1] ls /
[zookeeper]
[zk: localhost:2181(CONNECTED) 2] create /zk_test my_data
Created /zk_test
[zk: localhost:2181(CONNECTED) 3] ls /
[zookeeper, zk_test]
[zk: localhost:2181(CONNECTED) 4] get /zk_test
my_data
cZxid = 0x11
ctime = Mon Jul 11 21:03:22 KST 2016
mZxid = 0x11
mtime = Mon Jul 11 21:03:22 KST 2016
pZxid = 0x11
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 7
numChildren = 0
[zk: localhost:2181(CONNECTED) 5] set /zk_test junk
cZxid = 0x11
ctime = Mon Jul 11 21:03:22 KST 2016
mZxid = 0x12
mtime = Mon Jul 11 21:05:11 KST 2016
pZxid = 0x11
cversion = 0
dataVersion = 1
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 4
numChildren = 0
[zk: localhost:2181(CONNECTED) 6] get /zk_test
junk
cZxid = 0x11
ctime = Mon Jul 11 21:03:22 KST 2016
mZxid = 0x12
mtime = Mon Jul 11 21:05:11 KST 2016
pZxid = 0x11
cversion = 0
dataVersion = 1
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 4
numChildren = 0
[zk: localhost:2181(CONNECTED) 7] delete /zk_test
[zk: localhost:2181(CONNECTED) 8] ls /
[zookeeper]
[zk: localhost:2181(CONNECTED) 9]
Reference:
https://zookeeper.apache.org/doc/r3.4.8/zookeeperStarted.html
Friday, July 8, 2016
How to change Logstash's default max heap size
To change Logstash's default max heap size, do as follows:
LS_HEAP_SIZE=4g ./bin/logstash -f generator.conf
You can check if it works with `jps -v` as follows:
$ jps -v
15582 Main -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -Djava.awt.headless=true -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -Xmx4g -Xss2048k -Djffi.boot.library.path=/home/izeye/programs/logstash-2.3.4/vendor/jruby/lib/jni -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -Djava.awt.headless=true -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/home/izeye/programs/logstash-2.3.4/heapdump.hprof -Xbootclasspath/a:/home/izeye/programs/logstash-2.3.4/vendor/jruby/lib/jruby.jar -Djruby.home=/home/izeye/programs/logstash-2.3.4/vendor/jruby -Djruby.lib=/home/izeye/programs/logstash-2.3.4/vendor/jruby/lib -Djruby.script=jruby -Djruby.shell=/bin/sh
15646 Jps -Dapplication.home=/home/izeye/programs/jdk1.8.0_45 -Xms8m
$
You can see `-Xmx4g` (ie. 4GB).
Reference:
https://www.elastic.co/guide/en/logstash/current/command-line-flags.html
LS_HEAP_SIZE=4g ./bin/logstash -f generator.conf
You can check if it works with `jps -v` as follows:
$ jps -v
15582 Main -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -Djava.awt.headless=true -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -Xmx4g -Xss2048k -Djffi.boot.library.path=/home/izeye/programs/logstash-2.3.4/vendor/jruby/lib/jni -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -Djava.awt.headless=true -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/home/izeye/programs/logstash-2.3.4/heapdump.hprof -Xbootclasspath/a:/home/izeye/programs/logstash-2.3.4/vendor/jruby/lib/jruby.jar -Djruby.home=/home/izeye/programs/logstash-2.3.4/vendor/jruby -Djruby.lib=/home/izeye/programs/logstash-2.3.4/vendor/jruby/lib -Djruby.script=jruby -Djruby.shell=/bin/sh
15646 Jps -Dapplication.home=/home/izeye/programs/jdk1.8.0_45 -Xms8m
$
You can see `-Xmx4g` (ie. 4GB).
Reference:
https://www.elastic.co/guide/en/logstash/current/command-line-flags.html
Logstash's default max heap size
To know Logstash's default max heap size, use `jps -v` as follows:
$ jps -v
15396 Main -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -Djava.awt.headless=true -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -Xmx1g -Xss2048k -Djffi.boot.library.path=/home/izeye/programs/logstash-2.3.4/vendor/jruby/lib/jni -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -Djava.awt.headless=true -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/home/izeye/programs/logstash-2.3.4/heapdump.hprof -Xbootclasspath/a:/home/izeye/programs/logstash-2.3.4/vendor/jruby/lib/jruby.jar -Djruby.home=/home/izeye/programs/logstash-2.3.4/vendor/jruby -Djruby.lib=/home/izeye/programs/logstash-2.3.4/vendor/jruby/lib -Djruby.script=jruby -Djruby.shell=/bin/sh
15460 Jps -Dapplication.home=/home/izeye/programs/jdk1.8.0_45 -Xms8m
$
You can see `-Xmx1g` (ie. 1GB).
The result is from Logstash 2.3.4.
$ jps -v
15396 Main -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -Djava.awt.headless=true -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -Xmx1g -Xss2048k -Djffi.boot.library.path=/home/izeye/programs/logstash-2.3.4/vendor/jruby/lib/jni -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -Djava.awt.headless=true -XX:CMSInitiatingOccupancyFraction=75 -XX:+UseCMSInitiatingOccupancyOnly -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/home/izeye/programs/logstash-2.3.4/heapdump.hprof -Xbootclasspath/a:/home/izeye/programs/logstash-2.3.4/vendor/jruby/lib/jruby.jar -Djruby.home=/home/izeye/programs/logstash-2.3.4/vendor/jruby -Djruby.lib=/home/izeye/programs/logstash-2.3.4/vendor/jruby/lib -Djruby.script=jruby -Djruby.shell=/bin/sh
15460 Jps -Dapplication.home=/home/izeye/programs/jdk1.8.0_45 -Xms8m
$
You can see `-Xmx1g` (ie. 1GB).
The result is from Logstash 2.3.4.
How to get JVM default max heap size
To get JVM default max heap size, use the following command:
$ java -XX:+PrintFlagsFinal -version | grep HeapSize
uintx ErgoHeapSizeLimit = 0 {product}
uintx HeapSizePerGCThread = 87241520 {product}
uintx InitialHeapSize := 262144000 {product}
uintx LargePageHeapSizeThreshold = 134217728 {product}
uintx MaxHeapSize := 4179623936 {product}
java version "1.8.0_45"
Java(TM) SE Runtime Environment (build 1.8.0_45-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.45-b02, mixed mode)
$
In this case, you can see it's 4GB.
Reference:
http://stackoverflow.com/questions/12797560/command-line-tool-to-find-java-heap-size-and-memory-used-linux
$ java -XX:+PrintFlagsFinal -version | grep HeapSize
uintx ErgoHeapSizeLimit = 0 {product}
uintx HeapSizePerGCThread = 87241520 {product}
uintx InitialHeapSize := 262144000 {product}
uintx LargePageHeapSizeThreshold = 134217728 {product}
uintx MaxHeapSize := 4179623936 {product}
java version "1.8.0_45"
Java(TM) SE Runtime Environment (build 1.8.0_45-b14)
Java HotSpot(TM) 64-Bit Server VM (build 25.45-b02, mixed mode)
$
In this case, you can see it's 4GB.
Reference:
http://stackoverflow.com/questions/12797560/command-line-tool-to-find-java-heap-size-and-memory-used-linux
How to get VM parameters of running a Java process
To get VM parameters of running a Java process, do as follows:
$ jps -v
15286 Jps -Dapplication.home=/home/izeye/programs/jdk1.8.0_45 -Xms8m
$
$ jps -v
15286 Jps -Dapplication.home=/home/izeye/programs/jdk1.8.0_45 -Xms8m
$
How to pass an inline environment variable to an application in Linux
To pass an inline environment variable to an application in Linux, do as follows:
$ LS_HEAP_SIZE=4g ./some-script.sh
4g
$ echo $LS_HEAP_SIZE
$
`some-script.sh` simply includes `echo` for the environment variable as follows:
echo $LS_HEAP_SIZE
Note that the environment variable is not available in the next prompt.
$ LS_HEAP_SIZE=4g ./some-script.sh
4g
$ echo $LS_HEAP_SIZE
$
`some-script.sh` simply includes `echo` for the environment variable as follows:
echo $LS_HEAP_SIZE
Note that the environment variable is not available in the next prompt.
How to unset an environment variable set by `export` in Linux
To unset an environment variable set by `export` in Linux, use `unset` as follows:
$ export LS_HEAP_SIZE=4g
$ echo $LS_HEAP_SIZE
4g
$ unset LS_HEAP_SIZE
$ echo $LS_HEAP_SIZE
$
Reference:
http://stackoverflow.com/questions/6877727/how-do-i-delete-unset-an-exported-environment-variable
$ export LS_HEAP_SIZE=4g
$ echo $LS_HEAP_SIZE
4g
$ unset LS_HEAP_SIZE
$ echo $LS_HEAP_SIZE
$
Reference:
http://stackoverflow.com/questions/6877727/how-do-i-delete-unset-an-exported-environment-variable
Benchmark Logstash Kafka input plugin with no-op output except metrics
Test environment is as follows:
```
CPU: Intel L5640 2.26 GHz 6 cores * 2 EA
Memory: SAMSUNG PC3-10600R 4 GB * 4 EA
HDD: TOSHIBA SAS 10,000 RPM 300 GB * 6 EA
OS: CentOS release 6.6 (Final)
Logstash 2.3.4
```
I used the following configuration:
```
input {
kafka {
zk_connect => '1.2.3.4:2181'
topic_id => 'some-log'
consumer_threads => 1
}
}
filter {
metrics {
meter => "events"
add_tag => "metric"
}
}
output {
if "metric" in [tags] {
stdout { codec => line {
format => "Count: %{[events][count]}"
}
}
}
}
```
I got the following result:
```
./bin/logstash -f some-log-kafka.conf
Settings: Default pipeline workers: 24
Pipeline main started
Count: 9614
Count: 23080
Count: 37087
Count: 50815
Count: 64517
Count: 78296
Count: 91977
Count: 105990
```
Default `flush_interval` is 5 seconds, so it looks roughly 14K per 5 seconds (2.8K per second).
With `consumer_threads` set to 10, I got the following result:
```
./bin/logstash -f impression-log-kafka.conf
Settings: Default pipeline workers: 24
Pipeline main started
Count: 9599
Count: 23254
Count: 37253
Count: 51029
Count: 64881
Count: 78868
Count: 92663
Count: 106267
```
It looks increasing `consumer_threads` doesn't make much difference.
Based on benchmark using my simple no-op consumer built with Kafka client Java library in the same machine, I expected around 30K and at least 10K but it's just 1/10 of the expected performance.
I'm not sure this could be enhanced by customizing configuration.
As a base test, I tested with `generator` as follows:
```
input {
generator { }
}
filter {
metrics {
meter => "events"
add_tag => "metric"
}
}
output {
#stdout { }
if "metric" in [tags] {
stdout { codec => line { format => "Count: %{[events][count]}"
}
}
}
}
```
I got the following result:
```
./bin/logstash -f generator.conf
Settings: Default pipeline workers: 24
Pipeline main started
Count: 200584
Count: 424425
Count: 651640
Count: 881605
Count: 1110150
```
It looks roughly 220K per 5 seconds (44K per second). It's not good as much as I expected as my simple no-op consumer built with Kafka client Java library consumed from 30K to 50K per second.
What am I missing here?
References:
https://www.elastic.co/guide/en/logstash/current/plugins-filters-metrics.html
http://izeye.blogspot.kr/2016/07/benchmark-simple-no-op-kafka-consumer.html
```
CPU: Intel L5640 2.26 GHz 6 cores * 2 EA
Memory: SAMSUNG PC3-10600R 4 GB * 4 EA
HDD: TOSHIBA SAS 10,000 RPM 300 GB * 6 EA
OS: CentOS release 6.6 (Final)
Logstash 2.3.4
```
I used the following configuration:
```
input {
kafka {
zk_connect => '1.2.3.4:2181'
topic_id => 'some-log'
consumer_threads => 1
}
}
filter {
metrics {
meter => "events"
add_tag => "metric"
}
}
output {
if "metric" in [tags] {
stdout { codec => line {
format => "Count: %{[events][count]}"
}
}
}
}
```
I got the following result:
```
./bin/logstash -f some-log-kafka.conf
Settings: Default pipeline workers: 24
Pipeline main started
Count: 9614
Count: 23080
Count: 37087
Count: 50815
Count: 64517
Count: 78296
Count: 91977
Count: 105990
```
Default `flush_interval` is 5 seconds, so it looks roughly 14K per 5 seconds (2.8K per second).
With `consumer_threads` set to 10, I got the following result:
```
./bin/logstash -f impression-log-kafka.conf
Settings: Default pipeline workers: 24
Pipeline main started
Count: 9599
Count: 23254
Count: 37253
Count: 51029
Count: 64881
Count: 78868
Count: 92663
Count: 106267
```
It looks increasing `consumer_threads` doesn't make much difference.
Based on benchmark using my simple no-op consumer built with Kafka client Java library in the same machine, I expected around 30K and at least 10K but it's just 1/10 of the expected performance.
I'm not sure this could be enhanced by customizing configuration.
As a base test, I tested with `generator` as follows:
```
input {
generator { }
}
filter {
metrics {
meter => "events"
add_tag => "metric"
}
}
output {
#stdout { }
if "metric" in [tags] {
stdout { codec => line { format => "Count: %{[events][count]}"
}
}
}
}
```
I got the following result:
```
./bin/logstash -f generator.conf
Settings: Default pipeline workers: 24
Pipeline main started
Count: 200584
Count: 424425
Count: 651640
Count: 881605
Count: 1110150
```
It looks roughly 220K per 5 seconds (44K per second). It's not good as much as I expected as my simple no-op consumer built with Kafka client Java library consumed from 30K to 50K per second.
What am I missing here?
References:
https://www.elastic.co/guide/en/logstash/current/plugins-filters-metrics.html
http://izeye.blogspot.kr/2016/07/benchmark-simple-no-op-kafka-consumer.html
Benchmark a simple no-op Kafka consumer using Kafka client Java library
Test environment is as follows:
```
CPU: Intel L5640 2.26 GHz 6 cores * 2 EA
Memory: SAMSUNG PC3-10600R 4 GB * 4 EA
HDD: TOSHIBA SAS 10,000 RPM 300 GB * 6 EA
OS: CentOS release 6.6 (Final)
Kafka server 0.9.0.0
Kafka client Java library 0.9.0.1
```
I used a custom tool as follows:
```
git clone https://github.com/izeye/kafka-consumer.git
cd kafka-consumer/
./gradlew clean bootRepackage
java -jar build/libs/kafka-consumer-1.0.jar --spring.profiles.active=noop --kafka.consumer.bootstrap-servers=1.2.3.4:9092 --kafka.consumer.group-id=logstash --kafka.consumer.topic=some-log
```
I got the following result:
```
# of consumed logs per second: 29531
# of consumed logs per second: 38848
# of consumed logs per second: 28747
# of consumed logs per second: 49191
# of consumed logs per second: 28797
```
It consumed from 30K to 50K.
```
CPU: Intel L5640 2.26 GHz 6 cores * 2 EA
Memory: SAMSUNG PC3-10600R 4 GB * 4 EA
HDD: TOSHIBA SAS 10,000 RPM 300 GB * 6 EA
OS: CentOS release 6.6 (Final)
Kafka server 0.9.0.0
Kafka client Java library 0.9.0.1
```
I used a custom tool as follows:
```
git clone https://github.com/izeye/kafka-consumer.git
cd kafka-consumer/
./gradlew clean bootRepackage
java -jar build/libs/kafka-consumer-1.0.jar --spring.profiles.active=noop --kafka.consumer.bootstrap-servers=1.2.3.4:9092 --kafka.consumer.group-id=logstash --kafka.consumer.topic=some-log
```
I got the following result:
```
# of consumed logs per second: 29531
# of consumed logs per second: 38848
# of consumed logs per second: 28747
# of consumed logs per second: 49191
# of consumed logs per second: 28797
```
It consumed from 30K to 50K.
org.apache.kafka.common.protocol.types.SchemaException: Error reading field 'topic_metadata': Error reading array of size 552313, only 36 bytes available
If you try to connect from Kafka client 10.0.0.0 to Kafka server 0.9.0.0, you will get the following exception:
Caused by: org.apache.kafka.common.protocol.types.SchemaException: Error reading field 'topic_metadata': Error reading array of size 552313, only 36 bytes available
at org.apache.kafka.common.protocol.types.Schema.read(Schema.java:73) ~[kafka-clients-0.10.0.0.jar:na]
at org.apache.kafka.clients.NetworkClient.parseResponse(NetworkClient.java:380) ~[kafka-clients-0.10.0.0.jar:na]
at org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:449) ~[kafka-clients-0.10.0.0.jar:na]
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:269) ~[kafka-clients-0.10.0.0.jar:na]
Changing Kafka client version to 0.9.0.1 solves the problem.
Caused by: org.apache.kafka.common.protocol.types.SchemaException: Error reading field 'topic_metadata': Error reading array of size 552313, only 36 bytes available
at org.apache.kafka.common.protocol.types.Schema.read(Schema.java:73) ~[kafka-clients-0.10.0.0.jar:na]
at org.apache.kafka.clients.NetworkClient.parseResponse(NetworkClient.java:380) ~[kafka-clients-0.10.0.0.jar:na]
at org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:449) ~[kafka-clients-0.10.0.0.jar:na]
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:269) ~[kafka-clients-0.10.0.0.jar:na]
Changing Kafka client version to 0.9.0.1 solves the problem.
Thursday, July 7, 2016
How to extract a range of lines in a text file to another file in Linux
To extract a range of lines in a text file to another file in Linux, use the following command:
sed -n '1000,2000p' some.log > new.log
sed -n '1000,2000p' some.log > new.log
List Kafka consumer groups
To list Kafka consumer groups, use the following command:
./bin/kafka-consumer-groups.sh --zookeeper localhost:2181 --list
./bin/kafka-consumer-groups.sh --zookeeper localhost:2181 --list
Transfer logs from Kafka to Elasticsearch via Logstash
You can transfer logs from Kafka to Elasticsearch via Logstash with the follwoing configuration:
input {
kafka {
topic_id => 'some_log'
}
}
filter {
grok {
patterns_dir => ["./patterns"]
match => { "message" => "%{INT:log_version}\t%{INT:some_id}\t%{DATA:some_field}\t%{GREEDYDATA:last_field}" }
}
if [some_id] not in ["1", "2", "3"] {
drop { }
}
}
output {
elasticsearch {
hosts => [ "1.2.3.4:9200" ]
}
#stdout {
#codec => json
# codec => rubydebug
#}
}
Note that the last field can't be `DATA`. If you use `DATA`, the last field won't be parsed.
Reference:
http://stackoverflow.com/questions/38240392/logstash-grok-filter-doesnt-work-for-the-last-field
input {
kafka {
topic_id => 'some_log'
}
}
filter {
grok {
patterns_dir => ["./patterns"]
match => { "message" => "%{INT:log_version}\t%{INT:some_id}\t%{DATA:some_field}\t%{GREEDYDATA:last_field}" }
}
if [some_id] not in ["1", "2", "3"] {
drop { }
}
}
output {
elasticsearch {
hosts => [ "1.2.3.4:9200" ]
}
#stdout {
#codec => json
# codec => rubydebug
#}
}
Note that the last field can't be `DATA`. If you use `DATA`, the last field won't be parsed.
Reference:
http://stackoverflow.com/questions/38240392/logstash-grok-filter-doesnt-work-for-the-last-field
How to insert a tab in Mac terminal
To insert a tab in Mac terminal, do as follows:
control + `V` + tab
Reference:
https://discussions.apple.com/thread/2225213?tstart=0
control + `V` + tab
Reference:
https://discussions.apple.com/thread/2225213?tstart=0
Subscribe to:
Posts (Atom)