Skip to content

Using entgo to create and store API flows

Tamir Hyman
By Tamir Hyman
7 min read
GO
Using entgo to create and store API flows

As software developers, we need to strategize and think about how and where we want to save our data. Usually, we save it in some database but things get a little complicated in a microservice architecture where it's best practice for each MS to use its own datastore. 

Developers in a microservice-based environment are often required to change code in different microservices so they need to be experts in each microservice and the database it uses. 

At Seekret, we started a project of finding flows of API calls. While finding the flows is a challenging task on its own, we encountered the dilemma of how we want to save them.

Let's start with a small brief of what a flow is. A flow is a sequence of API calls with some logical connection between them. In this case, let’s assume the connection is what we call “property transfer” – a value returning from one API and is used in another.

Another important aspect of a flow are the elements – each element represents an API call, hence it has multiple parameters both in the request and in the response and each parameter has its own properties.

When we started talking about the data model, we automatically thought of a graph-based database. Every time we described a flow, it was always with nodes and edges, so it was the most natural choice.

For example, here is how we display a flow of 3 APIs in our system.

flows

There are many different graph databases, but we chose neo4j. Starting off, it looked cool - you can visualize your data and build your model as if it was a whiteboard. We created nodes for all our different objects (flow elements, parameters, etc..) and connected them with custom relations (FOLLOWED_BY, CONTAINS, PRODUCES, etc.).

At the beginning, everything was good but when feature demand grew, so did the size of the queries. And when the amount of data grew, we started experiencing performance issues.

The challenge 

Just for reference, here is the neo4j query we had to return all different flows:

MATCH (schema:SchemaVersion {version: $version, environment:$environment})
MATCH
(flow:Flow)-[:IN]-> (schema)
WHERE
$includeRemoved OR COALESCE(flow.removed, false) <> true
WITH flow
MATCH
(n) -[:APPEARS_IN]->(flow)
WITH
flow, collect(n) as nodes
MATCH
(e:Element) <-[:ENDS_IN]-(flow)
WITH
e as end, flow,nodes
CALL
apoc.path.expandConfig(flow, {whitelistNodes:nodes,terminatorNodes: [end],relationshipFilter: "STARTS_AT>|FOLLOWED_BY>",uniqueness:"NODE_PATH"})
YIELD path
where ALL(r in relationships(path) where ( 'FOLLOWED_BY' <> type(r)) or flow.flowIndex in r.flowIds)
WITH
flow, nodes, collect(path) as endpointPaths
OPTIONAL MATCH
path=(:Element)-[:PRODUCED]->(:OutParam)-[:LINKED_TO]->(:InParam) -[CONSUMED_BY]-(:Element)
WHERE
ALL(x in nodes(path) where x in nodes) AND
ALL (x in relationships(path) where ("LINKED_TO" <> type(x)) or flow.flowIndex in x.flowIds)
WITH
flow, nodes, endpointPaths, collect(path) as parameterPaths
OPTIONAL MATCH
path = (e:Element) -[:GENERATED]->(p:GeneratedParam)
WHERE
e IN nodes AND
(p) -[:APPEARS_IN]->(flow)
WITH
flow, endpointPaths,collect(path) as generatedPaths, parameterPaths
RETURN
flow.flowIndex as flowId, generatedPaths, parameterPaths,endpointPaths

We probably could’ve done a better job optimizing this query. I guess any neo4j expert would want to fire all of us. But that’s exactly the point – to really use neo4j correctly, we needed a neo4j expert. It was almost impossible for new developers to learn this code – the entrance barrier was way too high.

So, we went back to the drawing board and looked for a different solution. Our key objectives were:

  • Something that would fit our data model (which has graph properties)

  • Lower entrance bearer

  • Simpler and more readable code

Enter entgo

Entgo is an entity framework in Golang that allows developers to define a graph data model with nodes and edges, and auto generates an ORM and relevant libraries for database handling. The main features that drew us towards ent were:

  • Graph data model

  • Using the database as a code – no more long queries as string constants

  • Excellent migration system – migrating schema changes in neo4j was difficult

  • Different backend database support – as a young startup, we didn’t want to be too coupled with any specific database

  • Great community and docs – which makes the entrance barrier very low

  • Entgo is written in Golang, and so are most of our microservices

(It’s worth noting that neo4j also has an ORM but it’s in java.)

Using entgo to define a flow 

This is how we define a flow in our system. 

type Flow struct {
ent.Schema
}
// Indexes of the Flow.
func (Flow) Indexes() []ent.Index {
return []ent.Index{
index.Fields("environment", "unique_id").
Unique(),
}
}
// Fields of the Flow.
func (Flow) Fields() []ent.Field {
return []ent.Field{
field.String("unique_id").
MaxLen(255),
field.String("environment").
MaxLen(255),
field.Bool("is_removed"),
field.String("name").Optional(),
field.String("description").Optional(),
field.Strings("labels").Optional(),
field.Float("timestamp"),
}
}
// Edges of the Flow.
func (Flow) Edges() []ent.Edge {
return []ent.Edge{
edge.To("contains", Element.Type).
Annotations(entsql.Annotation{
OnDelete: entsql.Cascade,
}),
}
}

It can’t get any simpler than that. We define our different entities and their relationships by code, and entgo does all the hard work. We can even add schema validators to each field to catch errors earlier and validate input.

Remember the hideous neo4j query from before? Here’s how we get all our flows now. 

flows := client.Flows.Query().Where(
flows.IsRemoved(false),
flows.Environment(env),
).WithContains(
func(query *ent.ElementQuery) {
query.WithConsume().WithGenerate().WithProduce()
},
)

This simple (and readable!) query returns our results straight into go structs; no need to handle unmarshaling data from the db and converting db column types to Golang types.

The onboarding experience has never been easier and maintaining complex db schemas over time turned into a fairly simple task. 

What we didn’t expect 

It's important to note though, there is one major downside to this approach. Ent abstracts out the internal db implementation – but this comes at a price. It isn’t always easy to figure out what exact query will be performed on the db, and that might have a huge effect on performance.

For example, here are two different ways to write the exact same logic, but they result in two completely different queries on the database. One query uses indexes and is extremely fast, and the other isn’t.

Both queries look for a record, which has one of two optional foreign keys.

This is the slow one: it looks for “a record which either has a foreign key with value X or a foreign key with value Y”. 

client.Record.Query().Where(
record.Or(
record.HasForeignKeyWith(fk.IDEQ(oldFK)), record.HasForeignKeyWith(fk.IDEQ(newFK)),
),
).All(ctx)

This translates to the following SQL query:

select *
from records
where foreign_key_column in (select id from fk_table where id = oldFK)
or foreign_key_column in (select id from fk_table where id = newFK)

And this is the second one: it looks for “a record with has a foreign key with values X or Y”:

client.Record.Query().Where(
record.HasForeignKeyWith(
fk.IDIn(oldFK, newFK),
),
).All(ctx)

This translates into the following sql query:

select *
from records
where foreign_key_column in
(select id from fk_table where id in (oldFK, newFK))

The second query utilizes the index we have on the foreign_key_colums in the record table.

To overcome these cases, ent allows you to basically write in code the exact query you would wish to execute – but this is for the very delicate cases and requires more expertise in the exact database.

Our takeaways 

Since we moved our flows service to ent, we fell in love with it and today many of our microservices use ent to model and query their data.  Basically, any new microservice we create, we ask ourselves: “do we really need anything other than ent?”

At first glance, most of our services don’t require a graph data model. But after further thought, we found out that our code can benefit from such an approach.  

Let’s say service X saves records per user. Each record has a “user_id” member. We used ent to extract this to a “user” object which is “connected” to the record. This gave us an easy way to connect all user related resources in our code and a simple way to manage them. Operations such as modifying/deleting users and their data suddenly became simple for new developers.

Ent has a lot more to offer that I didn’t cover here and has huge benefits for developers:

  • Ent-cache to automatically cache queries

  • Extensive migration mechanism – allows running business logic inside the migration step to perform complicated migrations

  • GraphQL adapters – you know the feeling that now is not the right time to move to GraphQL? Well not anymore.

ORMs and entity frameworks are a powerful tool for developers, and I urge any of you who don’t use them to take a look. We chose ent because it fit our needs, but any other ORM can also solve many of these problems.

Using ent increased our developing velocity, reduced entrance barrier for new developers, and prevented bugs in compile time rather than in runtime.

About Seekret

Seekret's API governance platform empowers API-first practices by giving engineering teams the control they need to manage APIs, increase velocity, and reduce developer toil.

References

  1. https://microservices.io/patterns/data/database-per-service.html

  2. https://entgo.io

  3. https://neo4j.com/