前言
- 这周朋友突然在旧电脑中找到了之前下载的rapid7的dns开放数据集,用硬盘拷过来后开始准备洗数据,数据虽然是2020年的,但是还是想用来分析一下,坑我都挖好了。但是要等更新一批新数据后再写。
- 这篇只是搭建环境,导入数据的,可能还会写一个Web后端对接elasticsearch查询接口。
搭建elasticsearch服务
- 因为我只有一台esxi服务器,而且配置比较低,如果使用docker我不得不在之前的photon主机上扩展硬盘,所以还不如直接开一台虚拟机单独搭建elasticsearch数据库,这里我选择了bitnami的elasticsearch,开箱即用(还是需要配置一下,但是比自己从头安装要方便)
- 先在bitnami的官方网站上面下载ova虚拟机文件,在esxi服务器中添加虚拟机导入。

- 选择网络和配置后,记得把自动打开电源取消打勾,因为我们要在开机之前修改配置。

- 为了开机后可以直接通过ssh远程管理,这里填写了我自己的ssh公钥,没有可以不填,在vmware上管理。

- 给虚拟机添加到4核,8GB内存,硬盘先给1TB,理论上数据导入后会超过300GB,考虑到建立索引,聚合和以后的数据补充需要更多的硬盘空间。

- 开机之后可以在控制台看到IP地址,ssh远程连接上去先修改密码,然后访问9200端口服务是否正常启动。
bitnami@debian:~$ passwd bitnami
Changing password for bitnami.
Current password:
New password:
Retype new password:
passwd: password updated successfully
bitnami@debian:~$ curl localhost:9200
{
"name" : "debian",
"cluster_name" : "elasticsearch",
"cluster_uuid" : "U0FGXKwLREmMS-KtR7ksvQ",
"version" : {
"number" : "8.8.2",
"build_flavor" : "default",
"build_type" : "tar",
"build_hash" : "98e1271edf932a480e4262a471281f1ee295ce6b",
"build_date" : "2023-06-26T05:16:16.196344851Z",
"build_snapshot" : false,
"lucene_version" : "9.6.0",
"minimum_wire_compatibility_version" : "7.17.0",
"minimum_index_compatibility_version" : "7.0.0"
},
"tagline" : "You Know, for Search"
}
bitnami@debian:~$
bitnami@debian:~$ free
total used free shared buff/cache available
Mem: 8135672 4620120 3297460 556 218092 3283232
Swap: 0 0 0
对外开放elasticsearch服务(可选&&不推荐)
- 首先对外开放服务会存在安全风险,特别像未授权。所以这里极其不推荐直接对外开放服务,建议写一个后端调用查询服务。
- 上面请求是在本地请求的如果想要通过其他主机访问当前服务,需要将修改elasticsearch的配置文件,在下面可以看到
network
字段,将host
改成当前IP地址,我这里是10.168.1.211。顺便把下面的xpack.security.enabled
也修改成true
,方便后面添加密码认证。
bitnami@debian:~$ cat /opt/bitnami/elasticsearch/config/elasticsearch.yml
http:
port: "9200"
path:
data: /bitnami/elasticsearch/data
transport:
port: "9300"
action:
destructive_requires_name: "true"
network:
host: 127.0.1.1
publish_host: 127.0.1.1
bind_host: 127.0.0.1
node:
name: debian
discovery:
type: single-node
xpack:
security:
enabled: "false"
ml:
enabled: "false"
- 改完配置之后需要重启elasticsearch服务,这里使用bitnami写好的脚本
ctlscript.sh
,restart就可以了。
bitnami@debian:~$ sudo vi /opt/bitnami/elasticsearch/config/elasticsearch.yml
bitnami@debian:~$ sudo /opt/bitnami/ctlscript.sh help
usage: /opt/bitnami/ctlscript.sh [command]
/opt/bitnami/ctlscript.sh [command] [service]
Commands:
help show help menu
start start the service(s)
stop stop the service(s)
restart restart or start the service(s)
status show the status of the service(s)
bitnami@debian:~$ sudo /opt/bitnami/ctlscript.sh restart
restarting services...
- 最后修改防火墙配置文件
/etc/nftables.conf
,在入网规则将9200端口加入可访问状态,再重新执行一次nft
。就可以在其他主机访问了。
chain inbound {
# By default, drop all traffic unless it meets a filter
# criteria specified by the rules that follow below.
type filter hook input priority 0; policy drop;
# Allow traffic from established and related packets, drop invalid
ct state vmap { established : accept, related : accept, invalid : drop }
# Allow loopback traffic.
iifname lo accept
# Jump to chain according to layer 3 protocol using a verdict map
meta protocol vmap { ip : jump inbound_ipv4, ip6 : jump inbound_ipv6 }
# Allow selected ports for IPv4 and IPv6.
- tcp dport { 22 } accept
+ tcp dport { 22, 9200 } accept
# Uncomment to enable logging of denied inbound traffic
# log prefix "[nftables] Inbound Denied: " counter drop
}
bitnami@debian:~$ sudo /usr/sbin/nft -f /etc/nftables.conf
设置elasticsearch密码
- 为了安全起见,别被未授权访问了,一定要设置认证帐号密码。
➜ ~ curl <http://10.168.1.211:9200>
{
"name" : "debian",
"cluster_name" : "elasticsearch",
"cluster_uuid" : "U0FGXKwLREmMS-KtR7ksvQ",
"version" : {
"number" : "8.8.2",
"build_flavor" : "default",
"build_type" : "tar",
"build_hash" : "98e1271edf932a480e4262a471281f1ee295ce6b",
"build_date" : "2023-06-26T05:16:16.196344851Z",
"build_snapshot" : false,
"lucene_version" : "9.6.0",
"minimum_wire_compatibility_version" : "7.17.0",
"minimum_index_compatibility_version" : "7.0.0"
},
"tagline" : "You Know, for Search"
}
- 切换到elasticsearch的bin目录,执行elasticsearch-setup-passwords进入交互式终端,输入你要修改的密码。
bitnami@debian:/opt/bitnami$ sudo mkdir -p /opt/bitnami/elasticsearch/jdk/bin/
bitnami@debian:/opt/bitnami$ sudo ln -s /opt/bitnami/java/bin/java /opt/bitnami/elasticsearch/jdk/bin/java
bitnami@debian:/opt/bitnami/elasticsearch/bin$ sudo ./elasticsearch-setup-passwords interactive
******************************************************************************
Note: The 'elasticsearch-setup-passwords' tool has been deprecated. This command will be removed in a future release.
******************************************************************************
Initiating the setup of passwords for reserved users elastic,apm_system,kibana,kibana_system,logstash_system,beats_system,remote_monitoring_user.
You will be prompted to enter passwords as the process progresses.
Please confirm that you would like to continue [y/N]y
Enter password for [elastic]:
Reenter password for [elastic]:
Enter password for [apm_system]:
Reenter password for [apm_system]:
Enter password for [kibana_system]:
Reenter password for [kibana_system]:
Enter password for [logstash_system]:
Reenter password for [logstash_system]:
Enter password for [beats_system]:
Reenter password for [beats_system]:
Enter password for [remote_monitoring_user]:
Reenter password for [remote_monitoring_user]:
Changed password for user [apm_system]
Changed password for user [kibana_system]
Changed password for user [kibana]
Changed password for user [logstash_system]
Changed password for user [beats_system]
Changed password for user [remote_monitoring_user]
Changed password for user [elastic]
- 再次重启elasticsearch服务,现在请求需要带上帐号和密码才可以访问。
➜ ~ curl <http://10.168.1.211:9200> -u elastic:password
{
"name" : "debian",
"cluster_name" : "elasticsearch",
"cluster_uuid" : "U0FGXKwLREmMS-KtR7ksvQ",
"version" : {
"number" : "8.8.2",
"build_flavor" : "default",
"build_type" : "tar",
"build_hash" : "98e1271edf932a480e4262a471281f1ee295ce6b",
"build_date" : "2023-06-26T05:16:16.196344851Z",
"build_snapshot" : false,
"lucene_version" : "9.6.0",
"minimum_wire_compatibility_version" : "7.17.0",
"minimum_index_compatibility_version" : "7.0.0"
},
"tagline" : "You Know, for Search"
}
安装logstash
bitnami@debian:~$ wget -qO - <https://artifacts.elastic.co/GPG-KEY-elasticsearch> | sudo gpg --dearmor -o /usr/share/keyrings/elastic-keyring.gpg
bitnami@debian:~$ sudo apt-get install apt-transport-https
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
apt-transport-https is already the newest version (2.2.4).
0 upgraded, 0 newly installed, 0 to remove and 8 not upgraded.
bitnami@debian:~$ echo "deb [signed-by=/usr/share/keyrings/elastic-keyring.gpg] <https://artifacts.elastic.co/packages/8.x/apt> stable main" | sudo tee -a /etc/apt/sources.list.d/elastic-8.x.list
deb [signed-by=/usr/share/keyrings/elastic-keyring.gpg] <https://artifacts.elastic.co/packages/8.x/apt> stable main
bitnami@debian:~$ sudo apt-get update && sudo apt-get install logstash
Hit:1 <http://security.debian.org/debian-security> bullseye-security InRelease
Get:2 <https://artifacts.elastic.co/packages/8.x/apt> stable InRelease [10.4 kB]
Hit:3 <http://http.us.debian.org/debian> bullseye InRelease
Get:4 <https://artifacts.elastic.co/packages/8.x/apt> stable/main amd64 Packages [56.9 kB]
Hit:5 <http://http.us.debian.org/debian> bullseye-updates InRelease
Fetched 67.3 kB in 1s (59.6 kB/s)
Reading package lists... Done
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following NEW packages will be installed:
logstash
0 upgraded, 1 newly installed, 0 to remove and 8 not upgraded.
Need to get 346 MB of archives.
After this operation, 602 MB of additional disk space will be used.
Get:1 <https://artifacts.elastic.co/packages/8.x/apt> stable/main amd64 logstash amd64 1:8.9.0-1 [346 MB]
Fetched 346 MB in 1min 58s (2,939 kB/s)
debconf: delaying package configuration, since apt-utils is not installed
Selecting previously unselected package logstash.
(Reading database ... 21193 files and directories currently installed.)
Preparing to unpack .../logstash_1%3a8.9.0-1_amd64.deb ...
Unpacking logstash (1:8.9.0-1) ...
Setting up logstash (1:8.9.0-1) ...
- 这里安装完之后,我还安装了一个顶级域名解析插件logstash-filter-tld,因为rapid7的dns数据集是gz压缩的,所以还想安装logstash-codec-gzip_lines,但是发现已经年久失修了,无法正常解压gzip,所以这里使用stdin代替文件输入。
bitnami@debian:~$ cd /usr/share/logstash
bitnami@debian:/usr/share/logstash$ sudo ./bin/logstash-plugin install logstash-filter-tld
Using bundled JDK: /usr/share/logstash/jdk
Validating logstash-filter-tld
Resolving mixin dependencies
Installing logstash-filter-tld
Installation successful
/usr/share/logstash/vendor/bundle/jruby/2.6.0/gems/public_suffix-3.1.1/data
bitnami@debian:/usr/share/logstash$ sudo bin/logstash -e 'input { stdin {} } output { stdout {} }'
Using bundled JDK: /usr/share/logstash/jdk
[INFO ] 2023-08-03 05:49:47.141 [Api Webserver] agent - Successfully started Logstash API endpoint {:port=>9600, :ssl_enabled=>false}
[INFO ] 2023-08-03 05:49:47.243 [Converge PipelineAction::Create<main>] Reflections - Reflections took 54 ms to scan 1 urls, producing 132 keys and
464 values
[INFO ] 2023-08-03 05:49:47.334 [Converge PipelineAction::Create<main>] javapipeline - Pipeline `main` is configured with `pipeline.ecs_compatibili
ty: v8` setting. All plugins in this pipeline will default to `ecs_compatibility => v8` unless explicitly configured otherwise.
[INFO ] 2023-08-03 05:49:47.352 [[main]-pipeline-manager] javapipeline - Starting pipeline {:pipeline_id=>"main", "pipeline.workers"=>4, "pipeline.
batch.size"=>125, "pipeline.batch.delay"=>50, "pipeline.max_inflight"=>500, "pipeline.sources"=>["config string"], :thread=>"#<Thread:0x7acfec0d@/u
sr/share/logstash/logstash-core/lib/logstash/java_pipeline.rb:134 run>"}
[INFO ] 2023-08-03 05:49:47.616 [[main]-pipeline-manager] javapipeline - Pipeline Java execution initialization time {"seconds"=>0.26}
[INFO ] 2023-08-03 05:49:47.630 [[main]-pipeline-manager] javapipeline - Pipeline started {"pipeline.id"=>"main"}
The stdin plugin is now waiting for input:
[INFO ] 2023-08-03 05:49:47.638 [Agent thread] agent - Pipelines running {:count=>1, :running_pipelines=>[:main], :non_running_pipelines=>[]}
hi
{
"event" => {
"original" => "hi"
},
"@timestamp" => 2023-08-03T05:49:50.804958539Z,
"host" => {
"hostname" => "debian"
},
"message" => "hi",
"@version" => "1"
}
^C[WARN ] 2023-08-03 05:49:57.973 [SIGINT handler] runner - SIGINT received. Shutting down.
[INFO ] 2023-08-03 05:49:58.054 [[main]-pipeline-manager] javapipeline - Pipeline terminated {"pipeline.id"=>"main"}
[INFO ] 2023-08-03 05:49:58.983 [Converge PipelineAction::StopAndDelete<main>] pipelinesregistry - Removed pipeline from registry successfully {:pipeline_id=>:main}
[INFO ] 2023-08-03 05:49:58.986 [LogStash::Runner] runner - Logstash shut down.
rapid7 dns数据集结构
bitnami@debian:~$ pigz -dc 2020-02-21-1582243548-fdns_any.json.gz |head -n1 | pigz > test.json.gz
bitnami@debian:~$ pigz -dc test.json.gz
{"timestamp":"1582243897","name":"0.af.3ca9.ip4.static.sl-reverse.com","type":"a","value":"169.60.175.0"}
- 使用logstash生成测试数据,然后对数据进行修改后调试输出。执行
sudo bin/logstash -f /home/bitnami/logstash.conf
,后面的文件路径是配置文件,也就将下面内容保存到logstash.conf
文件。
input {
generator {
count => 1
message => '{"timestamp":"1582243897","name":"0.af.3ca9.ip4.static.sl-reverse.com","type":"a","value":"169.60.175.0"}'
codec => json_lines {
charset => "UTF-8"
}
}
}
filter {
}
output {
stdout {
codec => rubydebug {
}
}
}
bitnami@debian:/usr/share/logstash$ sudo bin/logstash -f /home/bitnami/logstash.conf
{
"name" => "0.af.3ca9.ip4.static.sl-reverse.com",
"type" => "a",
"@timestamp" => 2023-08-03T06:16:34.884998329Z,
"@version" => "1",
"timestamp" => "1582243897",
"value" => "169.60.175.0",
"host" => {
"name" => "debian"
}
}
使用logstash导入数据
- 好了,你已经学会了如何使用logstash了,现在我们来加一点点细节得到了下面内容。
input {
stdin {
codec => json {
}
}
}
filter {
tld {
source => "name"
}
date {
locale => "zh-CN"
timezone => "Asia/Shanghai"
match => ["timestamp","UNIX"]
target => "timestamp"
}
mutate {
add_field => {
"[@metadata][index]" => "%{[tld][tld]}"
"domain" => "%{[tld][domain]}"
}
remove_field => ["event","@timestamp","host","@version","tld"]
}
}
output {
elasticsearch {
document_id => "%{name}"
hosts => ["<http://localhost>"]
user => "elastic"
password => "password"
index => "rapid7_fdns_%{[@metadata][index]}"
template => "/home/bitnami/rapid7_fdns.json"
template_overwrite => "true"
}
}
- 这样就可以从标准输入读取json数据作为input的数据源,再通过tld插件解析event里面name字段得到tld对象,这里会将Punycode的域名丢掉,因为这个插件处理不了,而且这类域名都是跳转到主页的,作用不是很大,所以我也不打算另外处理这批数据。
- 然后将event里面timestamp字段转为date类型,因为他本来就是时间戳,在数据库里存字符串会很不合理。
- 再然后就是将tld插件解析的域名新建一个字段domian,方便精准搜索指定根域名,
tld
作为elasticsearch的index索引,把没用的字段删除掉。
- 最后将数据导入elasticsearch里面,index以
rapid7_fdns_
开头拼接tld
,因为我创建了一个索引模板绑定了这个正则表达式。模板内容如下:
- 将模板上传到
/home/bitnami/rapid7_fdns.json
{
"index_patterns": ["rapid7_fdns_*"],
"template": {
"settings": {
"index": {
"number_of_shards": 1,
"max_result_window": "100000",
"number_of_replicas": 0
}
},
"mappings": {
"properties": {
"domain": {
"type": "keyword"
},
"name": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
},
"timestamp": {
"type": "date"
},
"type": {
"type": "keyword"
},
"value": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
}
}
}
},
"priority": 51,
"version": 1,
"_meta": {
"description": "rapid7_fdns"
}
}
索引模板说明
- 经过logstash处理后,导入elasticsearch后结果如下:
"_source": {
"name": "0.red-80-34-227.staticip.rima-tde.net",
"type": "a",
"value": "80.34.227.0",
"timestamp": "2020-02-21T00:10:52.000Z",
"domain": "rima-tde.net"
}
- 因为
tyep
和domain
字段在搜索的时候都是完全精准匹配的,也不用分词,所以类型为:keyword
,其他的正常分词。模板如下:


导入结果

后端搜索逻辑
- 由于elasticsearch的索引使用了顶级域名的
suffix
作和前缀rapid7_fdns_
拼接而成,所以这里要用到https://github.com/emo-cat/tldextract-rs对搜索的域名进行解析,例如搜索kali-team.cn
,会将域名解析后得到suffix
为cn
,elasticsearch的索引就是rapid7_fdns_cn
,搜索请求为:
➜ ~ curl '<http://localhost:9200/rapid7_fdns_cn/_search>' -X POST -u elastic:password -H 'Content-Type: application/json' --data-raw $'{\\n\\x09"query": {\\n\\x09\\x09"term": {\\n\\x09\\x09\\x09"domain": "kali-team.cn"\\n\\x09\\x09}\\n\\x09},\\n\\x09"size": 10,\\n\\x09"from": 0,\\n\\x09"sort": []\\n}'
use elasticsearch::auth::Credentials;
use elasticsearch::http::transport::{SingleNodeConnectionPool, TransportBuilder};
use elasticsearch::http::Url;
use elasticsearch::{Elasticsearch, SearchParts};
use serde_json::{json};
#[tokio::main]
async fn main() -> Result<(), Box<dyn std::error::Error>> {
let credentials = Credentials::Basic("elastic".into(), "password".into());
let url = Url::parse("<http://localhost:9200>")?;
let conn_pool = SingleNodeConnectionPool::new(url);
let transport = TransportBuilder::new(conn_pool).auth(credentials).build()?;
let client = Elasticsearch::new(transport);
let response = client.search(SearchParts::Index(&["rapid7_fdns_com"]))
.from(0)
.size(10)
.body(json!({
"query": {"term": {"domain": "github.com"}}
}))
.send()
.await?;
println!("{:?}",response.text().await);
Ok(())
}

其他
➜ ~ curl "<http://localhost:9200/_cat/indices?format=json>" -u elastic:password |jq -r ".[].index"|while read line; do curl -X DELETE <http://localhost:9200/$line> -u elastic:password; done
{
"persistent": {
"cluster.max_shards_per_node": "8192"
}
}
参考