A collection of media I plan to read, am currently reading, or have read.
Public
Source Issue
Paywall
arXiv
Content
Data Science
- Data Synthesis & Privacy
- Bayesian ML
- Deep Learning
- Professional
- Style
Privacy: Issues, Regulation, and Philosophy
Recreation
- Fiction
- Nonfiction
Data Science
Data Synthesis & Privacy
-
Little, C. (2021, December 1-3). Generative Adversarial Networks for Synthetic Data Generation: A Comparative Study. Expert Meeting on Statistical Data Confidentiality. Poznań, Poland. Available at https://unece.org/statistics/documents/2021/12/working-documents/generative-adversarial-networks-synthetic-data.
-
Kotelnikov, A., Baranchuk, D., Rubachev, I., & Babenko, A.. (2022). TabDDPM: Modelling Tabular Data with Diffusion Models. Available at https://arxiv.org/abs/2209.15421.
-
Patki, N., Wedge, R., & Veeramachaneni, K. (2016). The Synthetic Data Vault. In 2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA) (pp. 399-410). Available at https://dai.lids.mit.edu/wp-content/uploads/2018/03/SDV.pdf.
-
Park, N., Mohammadi, M., Gorde, K., Jajodia, S., Park, H., & Kim, Y. (2018). Data Synthesis Based on Generative Adversarial Networks. Proc. VLDB Endow., 11(10), 1071–1083.
https://doi.org/10.14778/3231751.3231757
. Available at https://arxiv.org/abs/1806.03384. -
Ping, H., Stoyanovich, J., & Howe, B. (2017). DataSynthesizer: Privacy-Preserving Synthetic Datasets. In Proceedings of the 29th International Conference on Scientific and Statistical Database Management. Association for Computing Machinery. Available at https://dl.acm.org/doi/10.1145/3085504.3091117.
-
Rocher, L., Hendrickx, J.M. & de Montjoye, YA. Estimating the success of re-identifications in incomplete datasets using generative models. Nat Commun 10, 3069 (2019). Available at https://www.nature.com/articles/s41467-019-10933-3.
-
Taub, J., & Elliot, M. (2019, October 29-31). Creating the Best Risk-Utility Profile: The Synthetic Data Challenge. Work Session on Statistical Data Confidentiality. the Hague, the Netherlands. Available at https://unece.org/statistics/events/SDC2019.
-
Taub, J., Elliot, M., & Sakshaug, J. (2020). The Impact of Synthetic Data Generation on Data Utility with Application to the 1991 UK Samples of Anonymised Records. Transactions on Data Privacy, 13, 1-23. Available at https://www.research.manchester.ac.uk/portal/files/170745850/tdp.a306a18.pdf.
-
Weng L. (2021, July 11). What are Diffusion Models? Lil’Log. Available at https://lilianweng.github.io/posts/2021-07-11-diffusion-models.
-
Xu, L., Skoularidou, M., Cuesta-Infante, A., and Veeramachaneni, K. (2019). Modeling Tabular Data Using Conditional GAN. Proceedings of the 33rd International Conference on Neural Information Processing Systems. Article 659, 7335–7345. Available at https://dl.acm.org/doi/10.5555/3454287.3454946.
-
Zhang, J., Cormode, G., Procopiuc, C., Srivastava, D., & Xiao, X. (2017). PrivBayes: Private Data Release via Bayesian Networks. ACM Trans. Database Syst., 42(4). Available at https://dl.acm.org/doi/10.1145/3134428.
Bayesian Machine Learning
-
Blei, D. M., Kucukelbir, A., & McAuliffe, J. D. (2017). Variational Inference: A Review for Statisticians. Journal of the American Statistical Association, 112(518), 859-877.
https://doi.org/10.1080/01621459.2017.1285773
. Available at https://arxiv.org/abs/1601.00670. -
Gelman, A. (2006). Multilevel (Hierarchical) Modeling: What It Can and Cannot Do. Technometrics, 48(3), 432-435.
https://doi:10.1198/004017005000000661
. Available at http://www.stat.columbia.edu/~gelman/research/published/multi2.pdf.
Style
-
van Rossum, G., Warsaw, B., & Coghlan, N. (2001, July 5). PEP 8 – Style Guide for Python Code. Available at https://peps.python.org/pep-0008.