ApacheアクセスログをElasticsearchへ流す

Elasticsearchはdockerコンテナで用意、Apache側は落ちてたwordpressのコンテナにtd-agentをインストールしてテスト

  • td-agent 0.12.12
  • Elasticsearch 1.7.1
td-agent のインストール
curl -L https://toolbelt.treasuredata.com/sh/install-ubuntu-trusty-td-agent2.sh | sh
標準出力で確認

apache2のテンプレート↓がデフォルトで用意されているのでそのまま使う

format /^(?<host>[^ ]*) [^ ]* (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^ ]*) +\S*)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")?$/
time_format %d/%b/%Y:%H:%M:%S %z

/etc/td-agent/td-agent.conf

<source>
  type tail
  format apache2
  path /var/log/apache2/access.log
  pos_file /var/log/td-agent/apache_access.pos
  tag apache.access
</source>

<match *.**>
  type copy
  <store>
    type stdout
  </store>
</match>

td-agent を起動

/etc/init.d/td-agent start

dockerコンテナの上なんでもrootで動かしてるせいでログが読めない。。。

2015-10-04 17:16:11 +0000 [error]: Permission denied @ rb_sysopen - /var/log/apache2/access.log
  2015-10-04 17:16:11 +0000 [error]: suppressed same stacktrace

適当対応
/etc/init.d/td-agent

 :
TD_AGENT_USER=root
TD_AGENT_GROUP=root
 :

こんなん出た

2015-10-06 22:25:17 +0900 raw.apache.access: {"host":"192.168.1.1","user":null,"method":"POST","path":"/wp-admin/admin-ajax.php","code":200,"size":580,"referer":"http://192.168.1.10/wp-admin/index.php","agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36"}
2015-10-06 22:25:35 +0900 raw.apache.access: {"host":"192.168.1.1","user":null,"method":"GET","path":"/wp-admin/index.php","code":200,"size":14048,"referer":"http://192.168.1.10/wp-admin/index.php","agent":"Mozilla/5.0 (iPhone; CPU iPhone OS 5_0 like Mac OS X) AppleWebKit/534.46 (KHTML, like Gecko) Version/5.1 Mobile/9A334 Safari/7534.48.3"}
2015-10-06 22:25:55 +0900 raw.apache.access: {"host":"192.168.1.1","user":null,"method":"GET","path":"/wp-admin/index.php","code":200,"size":14046,"referer":"http://192.168.1.10/wp-admin/index.php","agent":"Mozilla/5.0 (iPad; CPU OS 5_0 like Mac OS X) AppleWebKit/534.46 (KHTML, like Gecko) Version/5.1 Mobile/9A334 Safari/7534.48.3"}
UAを解析したい

tagomoris/fluent-plugin-woothee · GitHub

# td-agent-gem install fluent-plugin-woothee
WARN: Unresolved specs during Gem::Specification.reset:
      json (>= 1.4.3)
WARN: Clearing out unresolved specs.
Please report a bug if this causes problems.
ERROR:  While executing gem ... (Gem::RemoteFetcher::FetchError)
    Errno::ECONNREFUSED: Connection refused - connect(2) for "your-dns-needs-immediate-attention.dev" port 443 (https://your-dns-needs-immediate-attention.dev/quick/Marshal.4.8/woothee-1.2.0.gemspec.rz)

変なエラーでた、コンテナをdevドメインにしてあるせいらしい
your-dns-needs-immediate-attention | Triple-networks

# echo "search home.local" >> /etc/resolv.conf 

/etc/td-agent/td-agent.conf

<source>
  type tail
  format apache2
  path /var/log/apache2/access.log
  pos_file /var/log/td-agent/apache_access.pos
  tag raw.apache.access
</source>

<match raw.**>
  type woothee
  key_name agent
  remove_prefix raw
  add_prefix parsed
  merge_agent_info yes
  out_key_name agent_name
  out_key_category agent_category
  out_key_os agent_os
  out_key_os_version agent_os_version
  out_key_version agent_version
  out_key_vendor agent_vendor
</match>

<match *.**>
  type copy
  <store>
    type stdout
  </store>
</match>

こうなった。

2015-10-06 23:17:10 +0900 parsed.apache.access: {"host":"192.168.1.1","user":null,"method":"GET","path":"/wp-admin/index.php","code":200,"size":14014,"referer":"http://192.168.1.10/wp-admin/plugins.php","agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36","agent_name":"Chrome","agent_category":"pc","agent_os":"Mac OSX","agent_os_version":"10.10.5","agent_version":"45.0.2454.101","agent_vendor":"Google"}
2015-10-06 23:18:17 +0900 parsed.apache.access: {"host":"192.168.1.1","user":null,"method":"GET","path":"/wp-admin/index.php","code":200,"size":14051,"referer":"http://192.168.1.10/wp-admin/index.php","agent":"Mozilla/5.0 (iPhone; CPU iPhone OS 5_0 like Mac OS X) AppleWebKit/534.46 (KHTML, like Gecko) Version/5.1 Mobile/9A334 Safari/7534.48.3","agent_name":"Safari","agent_category":"smartphone","agent_os":"iPhone","agent_os_version":"5.0","agent_version":"5.1","agent_vendor":"Apple"}
2015-10-06 23:19:11 +0900 parsed.apache.access: {"host":"192.168.1.1","user":null,"method":"GET","path":"/wp-admin/index.php","code":200,"size":14049,"referer":"http://192.168.1.10/wp-admin/index.php","agent":"Mozilla/5.0 (Linux; U; Android 4.0.4; en-gb; GT-I9300 Build/IMM76D) AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 Mobile Safari/534.30","agent_name":"Safari","agent_category":"smartphone","agent_os":"Android","agent_os_version":"4.0.4","agent_version":"4.0","agent_vendor":"Apple"}

他にも、filter_categories、drop_categories で特定のカテゴリを出力、破棄できる。
クローラリクエストを効率良くざっくり破棄した場合は、'woothee_fast_crawler_filter' を使う、完璧に破棄したい場合は、'woothee' + 'drop_categories crawler'を併せて使う。

geoipも使ってみる

y-ken/fluent-plugin-geoip · GitHub

# apt-get install build-essential
# apt-get install libgeoip-dev
# td-agent-gem install fluent-plugin-geoip

デフォルトでバンドルされてる無償データベースだと国レベル(緯度経度も)
/etc/td-agent/td-agent.conf

<source>
  type tail
  format apache2
  path /var/log/apache2/access.log
  pos_file /var/log/td-agent/apache_access.pos
  tag raw.apache.access
</source>

<match raw.**>
  type woothee
  key_name agent
  remove_prefix raw
  add_prefix ua_parsed
  merge_agent_info yes
  out_key_name agent_name
  out_key_category agent_category
  out_key_os agent_os
  out_key_os_version agent_os_version
  out_key_version agent_version
  out_key_vendor agent_vendor
</match>

<match ua_parsed.**>
  type geoip

  # Specify one or more geoip lookup field which has ip address (default: host)
  # in the case of accessing nested value, delimit keys by dot like 'host.ip'.
  geoip_lookup_key  host

  # Specify optional geoip database (using bundled GeoLiteCity databse by default)
  #geoip_database    "/path/to/your/GeoIPCity.dat"

  #enable_key_country_code geoip_country

  # Set adding field with placeholder (more than one settings are required.)
  <record>
    #city            ${city["host"]}
    geoip_latitude        ${latitude["host"]}
    geoip_longitude       ${longitude["host"]}
    geoip_country_code3   ${country_code3["host"]}
    geoip_country         ${country_code["host"]}
    country_name    ${country_name["host"]}
    #dma             ${dma_code["host"]}
    #area            ${area_code["host"]}
    #region          ${region["host"]}
    geoip_location_properties  '{ "lat" : ${latitude["host"]}, "lon" : ${longitude["host"]} }'
    geoip_location_string      ${latitude["host"]},${longitude["host"]}
    geoip_location_array       '[${longitude["host"]},${latitude["host"]}]'
  </record>

  # Settings for tag
  remove_tag_prefix ua_parsed.
  tag               parsed.${tag}

  # To avoid get stacktrace error with `[null, null]` array for elasticsearch.
  skip_adding_null_record  true

  # Set log_level for fluentd-v0.10.43 or earlier (default: warn)
  log_level         info

  # Set buffering time (default: 0s)
  flush_interval    1s
</match>

<match *.**>
  type copy
  <store>
    type stdout
  </store>
</match>

geoip_locationはどの形式でもO.K. 適当なIPを流して確認

2015-10-07 01:35:08 +0900 geoip.apache.access: {"host":"1.0.0.0","user":null,"method":"POST","path":"/wp-admin/admin-ajax.php","code":200,"size":580,"referer":"http://192.168.1.10/wp-admin/index.php","agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36","agent_name":"Chrome","agent_category":"pc","agent_os":"Mac OSX","agent_os_version":"10.10.5","agent_version":"45.0.2454.101","agent_vendor":"Google","geoip_latitude":-27.0,"geoip_longitude":133.0,"geoip_country_code3":"AUS","geoip_country":"AU","country_name":"Australia","geoip_location_properties":{"lat":-27.0,"lon":133.0},"geoip_location_string":"-27.0,133.0","geoip_location_array":[133.0,-27.0]}
2015-10-07 01:36:08 +0900 geoip.apache.access: {"host":"128.0.0.0","user":null,"method":"POST","path":"/wp-admin/admin-ajax.php","code":200,"size":580,"referer":"http://192.168.1.10/wp-admin/index.php","agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36","agent_name":"Chrome","agent_category":"pc","agent_os":"Mac OSX","agent_os_version":"10.10.5","agent_version":"45.0.2454.101","agent_vendor":"Google","geoip_latitude":46.0,"geoip_longitude":25.0,"geoip_country_code3":"ROU","geoip_country":"RO","country_name":"Romania","geoip_location_properties":{"lat":46.0,"lon":25.0},"geoip_location_string":"46.0,25.0","geoip_location_array":[25.0,46.0]}
2015-10-07 01:37:08 +0900 geoip.apache.access: {"host":"114.170.237.217","user":null,"method":"POST","path":"/wp-admin/admin-ajax.php","code":200,"size":580,"referer":"http://192.168.1.10/wp-admin/index.php","agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36","agent_name":"Chrome","agent_category":"pc","agent_os":"Mac OSX","agent_os_version":"10.10.5","agent_version":"45.0.2454.101","agent_vendor":"Google","geoip_latitude":35.689998626708984,"geoip_longitude":139.69000244140625,"geoip_country_code3":"JPN","geoip_country":"JP","country_name":"Japan","geoip_location_properties":{"lat":35.689998626708984,"lon":139.69000244140625},"geoip_location_string":"35.689998626708984,139.69000244140625","geoip_location_array":[139.69000244140625,35.689998626708984]}
Elasticsearchへ流す

uken/fluent-plugin-elasticsearch · GitHub

# td-agent-gem install fluent-plugin-elasticsearch

/etc/td-agent/td-agent.conf

 :
<match parsed.**>
  type elasticsearch
  hosts es1.containers.dev:9200,es2.containers.dev:9200
  type_name access
  logstash_format true
  logstash_prefix apache_log_wordpress
  logstash_dateformat %Y.%m
  flush_interval 10s
</match>

geoipの緯度経度がそのままだとgeo_pointにマッピングされないので、明示的にタイプをマッピングしておく。

curl -XPUT 'es1.containers.dev:9200/_template/apache_log/?pretty' -d '
{
  "template": "apache_log*",
  "mappings": {
    "access": {
      "properties": {
        "geoip_location_properties": {
          "type": "geo_point"
        },
        "geoip_location_string": {
          "type": "geo_point"
        },
        "geoip_location_array": {
          "type": "geo_point"
        }
      }
    }
  }
}
'

適当にログを流す、ドキュメントはこんな感じになった

{
  "_index": "apache_log_wordpress-2015.10",
  "_type": "access",
  "_id": "AVBOD5OpAEC4whOefbLc",
  "_score": 1,
  "_source": {
    "host": "1.0.0.0",
    "user": null,
    "method": "POST",
    "path": "/wp-admin/admin-ajax.php",
    "code": 200,
    "size": 580,
    "referer": "http://192.168.1.10/wp-admin/index.php",
    "agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/45.0.2454.101 Safari/537.36",
    "agent_name": "Chrome",
    "agent_category": "pc",
    "agent_os": "Mac OSX",
    "agent_os_version": "10.10.5",
    "agent_version": "45.0.2454.101",
    "agent_vendor": "Google",
    "geoip_latitude": -27,
    "geoip_longitude": 133,
    "geoip_country_code3": "AUS",
    "geoip_country": "AU",
    "country_name": "Australia",
    "geoip_location_properties": {
      "lat": -27,
      "lon": 133
      },
    "geoip_location_string": "-27.0,133.0",
    "geoip_location_array": [133, -27],
    "@timestamp": "2015-10-07T01:35:08+09:00"
  }
}

マッピングも確認

# curl -XGET 'es1.containers.dev:9200/apache_log_wordpress-2015.10/_mapping/?pretty'
{
  "apache_log_wordpress-2015.10" : {
    "mappings" : {
      "access" : {
        "properties" : {
          "@timestamp" : {
            "type" : "date",
            "format" : "dateOptionalTime"
          },
          "agent" : {
            "type" : "string"
          },
          "agent_category" : {
            "type" : "string"
          },
          "agent_name" : {
            "type" : "string"
          },
          "agent_os" : {
            "type" : "string"
          },
          "agent_os_version" : {
            "type" : "string"
          },
          "agent_vendor" : {
            "type" : "string"
          },
          "agent_version" : {
            "type" : "string"
          },
          "code" : {
            "type" : "long"
          },
          "country_name" : {
            "type" : "string"
          },
          "geoip_country" : {
            "type" : "string"
          },
          "geoip_country_code3" : {
            "type" : "string"
          },
          "geoip_latitude" : {
            "type" : "double"
          },
          "geoip_location_array" : {
            "type" : "geo_point"
          },
          "geoip_location_properties" : {
            "type" : "geo_point"
          },
          "geoip_location_string" : {
            "type" : "geo_point"
          },
          "geoip_longitude" : {
            "type" : "double"
          },
          "host" : {
            "type" : "string"
          },
          "method" : {
            "type" : "string"
          },
          "path" : {
            "type" : "string"
          },
          "referer" : {
            "type" : "string"
          },
          "size" : {
            "type" : "long"
          }
        }
      }
    }
  }
}
ここまで

Kibanaで軽く確認してチャートや地図も問題無し、@timestampもちゃんとログのリクエスト時間になってる。
とりあえず導入としてはこんなもんで。