Senior Data Engineer, Online Genealogy Servicer
About the vacancy
The client is an international company that provides an online genealogy service that helps its clients understand their past and family history.
We are looking for a Data Engineer who will join a team working on the maintenance of the data workflow and ingestion of scanned newspaper image data. This involves handling a lot of data throughput in a reliable and consistent way.
The specialist will help the existing team to manage the file systems, databases, and data ingestion into Solr, as well as managing internal, web-based tools that the client’s Quality Control team uses to validate images before they are published.
There is also an element of DevOps and Systems Administration - the team works with a significant number of physical and virtual servers, handling deployment pipelines, etc.
In the coming months, the client will be investigating an option to include Machine Learning techniques as part of a process to improve the quality of their OCR. There is a likelihood that they will apply some ML techniques over the course of this project, but this is likely only to constitute a part of the role.
There are multiple teams consisting of 5-7 people. The teams include DataArt engineers and stakeholders from the client side working in a mature Agile environment.
We hire people not for a project but for the company. If the project (or your work on it) is over you go to another project or to a paid “Idle”.
- Managing file systems; managing databases; managing data ingest into Solr and managing Solr at scale
- Handling large amounts of XML
- Management of internal, web-based tools
- Potential to use ML techniques as a part of the process of improving the quality of their OCR, possibly after a few months
- Experience with SQL (MySQL) databases and handling large amounts of data
- Comfortable working from the terminal in Linux/Unix (Ubuntu)
- Good knowledge of at least one programming language (Ruby, Python etc.)
- A hands-on approach to getting stuff done
- A curiosity to learn and widen your skillset
- Rails (for internal web-based tools)
- Experience with ZFS, XML
- Tensorflow (not extensively so far – used for ML work)
- AWS/Azure (used from time to time)
- Experience with Apache Solr
Would be a plus
- Focus on quality, with testing experience and a willingness to pair collaboratively
- Background in DevOps/Systems Administration
- Experience with Docker, Git, Kubernetes
- Experience with XML processing
- Working knowledge of, or an interest in image data processing
Learn more about our policy of equal opportunities in employment
Work at DataArt is
Our relationships with clients and colleagues are based on mutual respect, no matter what differences we may have.
- Long-term partnership
- Respect for individuality and freedom of expression
- Flexible schedule, comfortable offices, and the ability to work from home
- Market-driven compensation and health care
- High quality internal administrative services
Get the opportunity to unleash your potential in DataArt's ecosystem
- Highly qualified team
- Communities and knowledge sharing
- English classes
- Internal educational system
Freedom to explore and opportunities to get new experience and knowledge. Constant willingness to change
- Work contract with DataArt, not project based employment
- Flat structure
- Minimum rules
- Rules and policies change with context, while values stay the same
- Easy movement among offices and opportunities for relocation
The ability to count on each other and the willingness to trust people lies at the heart of relationships in DataArt
- Management via context, bottom-up decision making. We avoid micromanagement
- Clear equal rules and policies
- Fair management
- No ranking vs others, no regular reassessments. Fair seniority assessment