Discussion:
grok to parse python dict
Vlad Vintila
2012-05-07 15:20:54 UTC
Permalink
Hello,


I'm not sure if this can be done, and I admit that it's a strange request
to have, but here it goes.

I'm trying to make a grok filter that parses a python dict that has a
variable number of keys.

So if I have:

{ 'key1': 'value1', 'key2': 'value2' , 'key3': 'value3'} I would like grok
to add 3 fields with the key names as the field name, and their respecting
values.

Something like this:
grok{
type => "mylog"
pattern => "'%{STRING:type}': '%{DATA:value}'"
add_field => ["%{type}","%{value}"]
}

This doesn't work ofc, as it would only match once, also the %{type} does
not get replaced with its value(as opposed to %{value}), and removing the
quotes results in syntax error.


The alternative to this is to have each key(they are many, but finite)
defined in my patterns file, but I am hoping you guys will show me the
smart way.



Thanks,
Vlad Vintila
Pete Fritchman
2012-05-07 15:29:51 UTC
Permalink
I'm not sure if this can be done, and I admit that it's a strange request to
have, but here it goes.
I'm trying to make a grok filter that parses a python dict that has a
variable number of keys.
Could you change the application to log this dict serialized to JSON?
Then you could just use the json filter (or if the entire input line
was JSON, set the input format to "json").
--
petef
Vlad Vintila
2012-05-07 16:00:59 UTC
Permalink
Thanks for the quick reply. I will see if I can change the format.
This would definitely solve the problem for me, but am I still curious if
there is an answer to solving the actual task: parsing a dict or hash.

Thanks,
Vlad Vintila
Post by Vlad Vintila
I'm not sure if this can be done, and I admit that it's a strange
request to
Post by Vlad Vintila
have, but here it goes.
I'm trying to make a grok filter that parses a python dict that has a
variable number of keys.
Could you change the application to log this dict serialized to JSON?
Then you could just use the json filter (or if the entire input line
was JSON, set the input format to "json").
--
petef
Vlad Vintila
2012-05-08 10:24:00 UTC
Permalink
I changed my format to json. Can you give a config sample for the json
filter that puts the keys as fields with their respective values as values?

Only thing I've found on this is this example:
https://gist.github.com/1543170 , but I don't want that.(also I read that
you cannot get nested the data, yet)

Thanks,
Vlad Vintila
Post by Vlad Vintila
Thanks for the quick reply. I will see if I can change the format.
This would definitely solve the problem for me, but am I still curious if
there is an answer to solving the actual task: parsing a dict or hash.
Thanks,
Vlad Vintila
Post by Vlad Vintila
I'm not sure if this can be done, and I admit that it's a strange
request to
Post by Vlad Vintila
have, but here it goes.
I'm trying to make a grok filter that parses a python dict that has a
variable number of keys.
Could you change the application to log this dict serialized to JSON?
Then you could just use the json filter (or if the entire input line
was JSON, set the input format to "json").
--
petef
Pete Fritchman
2012-05-08 14:13:06 UTC
Permalink
Post by Vlad Vintila
I changed my format to json. Can you give a config sample for the json
filter that puts the keys as fields with their respective values as values?
Only thing I've found on this is this
example: https://gist.github.com/1543170 , but I don't want that.(also I
Right now it's only nested. If you log the entire log event in JSON,
you can set "format => json" on your file input.
Post by Vlad Vintila
read that you cannot get nested the data, yet)
I'm working on a patch that lets you do %{foo.bar} to get at the nested data.
--
petef
Vlad Vintila
2012-05-08 14:29:04 UTC
Permalink
I don't have the entire log event in JSON, just a part of it. If I would
have had it JSON and if I were to set "format => json" on my file input,
would that achieve what I asked for in the first place?



Thanks,
Vlad Vintila
Post by Vlad Vintila
Post by Vlad Vintila
I changed my format to json. Can you give a config sample for the json
filter that puts the keys as fields with their respective values as
values?
Post by Vlad Vintila
Only thing I've found on this is this
example: https://gist.github.com/1543170 , but I don't want that.(also I
Right now it's only nested. If you log the entire log event in JSON,
you can set "format => json" on your file input.
Post by Vlad Vintila
read that you cannot get nested the data, yet)
I'm working on a patch that lets you do %{foo.bar} to get at the nested data.
--
petef
Pete Fritchman
2012-05-08 14:36:38 UTC
Permalink
Post by Vlad Vintila
I don't have the entire log event in JSON, just a part of it. If I would
have had it JSON and if I were to set "format => json" on my file input,
would that achieve what I asked for in the first place?
Yes -- all of the JSON keys will be the "top level" namespace for the
event (i.e. keys in @event.fields).
--
petef
Vlad Vintila
2012-05-08 15:11:36 UTC
Permalink
I understand.

I don't get why you suggested changing the dict to a json if there is no
way to parse it entirely, without specifying any keys.

Thanks,
Vlad Vintila
Post by Pete Fritchman
Post by Vlad Vintila
I don't have the entire log event in JSON, just a part of it. If I would
have had it JSON and if I were to set "format => json" on my file input,
would that achieve what I asked for in the first place?
Yes -- all of the JSON keys will be the "top level" namespace for the
--
petef
Pete Fritchman
2012-05-08 15:41:56 UTC
Permalink
I don't get why you suggested changing the dict to a json if there is no way
to parse it entirely, without specifying any keys.
It all depends what you're looking to do with logstash; it's a pretty
flexible tool, and it's hard to predict your exact use case when all
you asked was "I want to parse a python dict". Yes, right now after
expanding the JSON with the filter, you can't access those "sub-keys"
with %{foo} expansion, but it's still useful to expand the JSON if
you're just outputting to elasticsearch and doing more there.
--
petef
Vlad Vintila
2012-05-09 12:47:25 UTC
Permalink
Wouldn't the feature to automatically parse a json be very useful?( even if
it's a part of the event, not the whole)

It would come in handy for example when parsing POST events of an api.

Just a suggestion for some future release.

Thanks,
Vlad Vintila
Post by Pete Fritchman
Post by Vlad Vintila
I don't get why you suggested changing the dict to a json if there is no
way
Post by Vlad Vintila
to parse it entirely, without specifying any keys.
It all depends what you're looking to do with logstash; it's a pretty
flexible tool, and it's hard to predict your exact use case when all
you asked was "I want to parse a python dict". Yes, right now after
expanding the JSON with the filter, you can't access those "sub-keys"
with %{foo} expansion, but it's still useful to expand the JSON if
you're just outputting to elasticsearch and doing more there.
--
petef
Jordan Sissel
2012-05-09 17:18:54 UTC
Permalink
Post by Vlad Vintila
Wouldn't the feature to automatically parse a json be very useful?( even
if it's a part of the event, not the whole)
It would come in handy for example when parsing POST events of an api.
You can do this today with grok (to isolate the JSON blob) and the json
filter to treat it like json.

-Jordan

Loading...